Method of using a crystal of the N-terminal domain of a signal transducer and activator of transcription

ABSTRACT

The present invention provides a crystal containing the N-terminal domain of a STAT protein that is of sufficient quality to perform X-ray crystallographic studies. Methods of preparing the crystals are include in the invention. The present invention further discloses the three-dimensional structure of the crystal. The present invention also provides methods of using the structural information in drug discovery and drug development.

GOVERNMENTAL SUPPORT

[0001] The research leading to the present invention was supported, atleast in part, by NIH Grant Nos. A132489 and A134420. Accordingly, theGovernment may have certain rights in the invention.

FIELD OF THE INVENTION

[0002] The present invention relates generally to structural studies ofSTAT proteins, modified STAT proteins and more particularly theN-terminal domain of STAT proteins. Included in the present invention isa crystal of the N-terminal domain of a STAT protein and correspondingstructural information obtained by X-ray crystallography. The presentinvention also relates to methods of using the crystal and relatedstructural information in drug screening assays.

BACKGROUND OF THE INVENTION

[0003] Transcription factors play a major role in cellular function byinducing the transcription of specific mRNAs. Transcription factors, inturn, are controlled by distinct signaling molecules. The STATs (SignalTransducer and Activator of Transcription) constitute a family oftranscription factors necessary to activate distinct sets of targetgenes in response to cytokines and growth factors [Darnell et al. WO95/08629, (1995)]. The STAT proteins are activated in the cytoplasm byphosphorylation on a single tyrosine residue [Darnell et al., Science264:1415 (1994)]. The responsible kinases are either ligand-activatedtransmembrane receptors with intrinsic tyrosine kinase activity, such asEGF- or PDGF-receptors, or cytokine receptors that lack intrinsic kinaseactivity but have associated JAK kinases, such as those for interferonsand interleukins [Ihle, Nature 377:591-594 (1995)]. One distinctivecharacteristic of the STAT proteins are their apparent lack ofrequirement for changes in second messenger, e.g., cAMP or Ca⁺⁺,concentrations. Presently, there are seven known mammalian STAT familymembers. The recent discovery of a Drosophila STAT protein, suggeststhat these proteins have played an important role in signal transductionsince the early stages of our evolution [Darnell, PNAS, 94:11767-11769(1997)].

[0004] Each STAT protein contains a SRC homology domain (SH2 domain).When activated, the STAT proteins are phosphorylated, and form homo- orheterodimeric structures in which the phosphotyrosine of one partnerbinds to the SH2 domain of the other. The reciprocal SH2-phosphotyrosineinteractions between two STAT proteins result in the formation of anactive dimer that translocates to the nucleus and activates specificgene expression [Darnell et al., Science 264:1415 (1994)] by binding toa canonical recognition site for the STAT dimer. This canonicalrecognition site encompasses 9-10 base pairs (TTCN₃₋₄GAA) of DNA[Horvath et al., Genes & Devel. 9:984 (1995); Seidel et al., Proc. Natl.Acad. Sci. USA 92:3041 (1995); Ihle, Cell 84:331 (1996); Mikita et al.,Mol. Cell. Biol. 16:5811 (1996)]. Analysis of the binding of activatedSTATs to DNA targets has revealed that the STAT binding sites can extendover two or more adjacent canonical sites [Xu et al., Science 273:750(1996); Meier and Groner, Mol. Cell. Biol. 14:128 (1994); Symes et al.,Molecular Endocrinology 8;1750 (1994); Dajee et al., MolecularEndocrinology 10:171 (1996); John et al., EMBO J. 15:5627 (1996)].

[0005] STAT proteins serve in the capacity as a direct messengersbetween the cytokine or growth factor receptor present on the cellsurface, and the cell nucleus. However, since each cytokine and growthfactor produce a specific cellular effect by activating a distinct setof genes, the means in which such a limited number of STAT proteinsmediate this result remains a mystery. Indeed, at least twenty-fivedifferent ligand-receptor complexes signal the nucleus through the sevenknown mammalian STAT proteins [Yan et al., Cell 84:421-430 (1996)].

[0006] There is increasing evidence that mammalian transcription factorsactivate transcription and achieve biological specificity byinteractions with other transcription factors, trans-activators or thegeneral transcription machinery [McKnight, Genes & Development 10:367(1996); Roeder, Trends in Biochemical Sciences 21:327 (1996)]. Althoughthe molecular basis for these phenomena is poorly understood, directprotein:protein interactions among multiple promoter bound proteinsappear to mediate this synergistic activation [Tijan and T. Maniatis,Cell 77:5 (1994)].

[0007] In the case of the STATs, a small N-terminal domain has beenshown to mediate a number of important protein:protein interactions thatinfluence transcriptional outcome [Leung et al., Science 273:750 (1996);Vinkemeier et al., EMBO J. 15: 5616 (1996)]. This domain allowscooperative interactions between STAT dimers bound to adjacent targetsites on DNA, leading to a drastically prolonged half-life of theprotein-DNA complex [Vinkemeier et al., EMBO J 15: 5616 (1996)].Functional assays exploring the induction of the hepatic Spi 2.1 generevealed the necessity for cooperative STAT binding to two adjacentrecognition sites for a full growth hormone response [Bergad et al., J.Biol. Chem. 270, 24903 (1995)]. In addition, it was observed that thesecooperative contacts affect the binding site selection of differentSTATs on a natural promoter that contains multiple potential STATrecognition sites [Xu et al., Science 273:750 (1996)]. Each of theoligomerized STAT-1, -4, and -5 dimers were shown to bind to a differentcombination of canonical sites [Xu et al., Science 273:750 (1996)].Deletion of the N-terminal ˜100 residues of STAT-1 and STAT-4 abolishescooperative binding to DNA [Xu et al., Science 273:750 (1996);Vinkemeier et al., EMBO J 15: 5616 (1996)]. The truncated protein fullyretains binding to a single target site as a dimer, suggesting that theN-terminal domain is dispensable for dimer formation and DNA binding [Xuet al., Science 273:750 (1996); Vinkemeier et al., EMBO J. 15: 5616(1996)], but is necessary for interaction between STAT dimers andbinding site discrimination [Xu et al., Science 273:750 (1996)]. Also,the N-domain of STAT-1 is required for interaction between STAT-1 andthe transcriptional co-activator protein CBP, a large (˜2500 aminoacids) polypeptide with transacetylase activity [Zhang et al., Proc.Natl Acad. Sci. USA 93:15092 (1996)]. Additionally, the amino-terminalregion of STAT-2 is involved in binding to the intracellular region ofthe interferon-a receptor [Leung et al., Mol. Cell. Biol. 15:1312(1995)].

[0008] Therefore, there is a need to obtain agonists and antagoniststhat can modulate the effect of STAT proteins during specific geneactivation. In particular, there is a need to obtain drugs that willdirectly interact with the important N-terminal domain of STAT proteins.Unfortunately, identification of such drugs have heretofore relied onserendipity and/or systematic screening of large numbers of natural andsynthetic compounds. A far superior method of drug-screening relies onstructure based drug design. In this case, the three dimensionalstructure of a protein or protein fragment is determined and potentialagonists and/or potential antagonists are designed with the aid ofcomputer modeling [Bugg et al., Scientific American, December:92-98(1993); West et al., TIPS, 16:67-74 (1995)]. However, heretofore thethree-dimensional structure of a STAT protein or fragment thereof hasremained unknown, essentially because no such protein crystals had beenproduced of sufficient quality to allow the required X-raycrystallographic data to be obtained.

[0009] Therefore, there is presently a need for obtaining an N-terminalSTAT domain fragment that can be crystallized to form a crystal withsufficient quality to allow such crystallographic data to be obtained.Further, there is a need for such crystals. Furthermore there is a needfor the determination of the three-dimensional structure of suchcrystals. Finally, there is a need for procedures for related structuralbased drug design based on such crystallographic data.

[0010] The citation of any reference herein should not be construed asan admission that such reference is available as “Prior Art” to theinstant application.

SUMMARY OF THE INVENTION

[0011] The present invention provides a crystal containing theN-terminal domain of a STAT protein which effectively diffracts X-raysand thereby allows the determination of the atomic coordinates of theN-terminal domain to a resolution of greater than 5.0 Angstroms. In apreferred embodiment of this type the crystal effectively diffractsX-rays for the determination of the atomic coordinates of the N-terminusto a resolution of greater than 3.0 Angstroms. In a more preferredembodiment of this type the crystal effectively diffracts X-rays for thedetermination of the atomic coordinates of the N-terminus to aresolution of greater than 2.0 Angstroms.

[0012] In one embodiment the N-terminal domain of the crystal comprisesthe amino acid sequence of:

[0013] Arg Xaa ^(H)Xaa Leu Xaa Xaa Trp ^(H)Xaa Glu Xaa Gln Xaa Trp (SEQID NO: 1), where ^(H)Xaa can be either Ile, Leu, Val, Phe, or Tyr andXaa can be any amino acid. In another embodiment the crystal of theN-terminal domain of the STAT protein is contained in a STAT fragmentthat consists of 100 to 150 amino acids. In a preferred embodiment theSTAT fragment comprises amino acids 4-112 of SEQ ID NO:2. In a morepreferred embodiment the crystal contains an N-terminal domain of a STATprotein comprising amino acid residues 2-123 of SEQ ID NO:2 with 5additional amino acid residues N-terminal to amino acid residue number2, i.e., from the N-terminus GLY SER GLY GLY GLY, amino acid residue 2.In one embodiment of this type the crystal effectively diffracts X-raysto allow the determination of the atomic coordinates of the N-terminusto a resolution of 1.45 Angstroms.

[0014] The present invention provides a crystal of the N-terminal domainhaving a space group of P6₅22 and a unit cell of dimensions a=79.51 Å,b=79.51 Å, and c=84.68 Å. The present invention further provides acrystal of the N-terminal domain having secondary structural elementscomprising eight helices (α1-α8) that are assembled into a hook-likestructure that has an inner and outer surface. The first four helices(α1-α4) form a ring-shaped element having a proximal and a distalsurface, whereas helices six (α6) and seven (α7) form an anti-parallelcoiled-coil that also has a proximal and a distal surface. Helix five(α5) connects the ring-shaped element to the anti-parallel coiled-coil,while helix eight (α8) is wrapped around the distal surface of thering-shaped element. The inner surface of the hook-like structure isformed by the intersection of the proximal surface of the ring-shapedelement with the proximal surface of the antiparallel coiled-coil.

[0015] The present invention also provides a method of growing a crystalof an N-terminal domain of a STAT protein. The method of making thecrystal comprises placing an aliquot of a solution containing a STATN-terminal domain fragment of the present invention on a cover slip as ahanging drop above a well containing a reservoir buffer that comprises0.2 M Na⁺CH₃COO⁻, 0.1 M Tris/HCL pH 8.0, 17% PEG4000. All of the STATN-terminal domain fragments of the present invention may be used in thismanner to prepare such a crystal. In one specific embodiment thesolution containing a STAT N-terminal domain fragment is prepared bycombining a preparation of the STAT N-terminal domain fragment thatcontains 20 mg/ml of the STAT N-terminal domain fragment in 50 mMHepes/HCl pH 8.0, 150 mM KCl, 2.5 mM CaCl₂, and 5 mM DTT with an equalvolume of the reservoir buffer. In a preferred embodiment the aliquotcomprises approximately 1 μl of the solution. In another preferredembodiment the STAT N-terminal domain fragment comprises the amino acidsequence of:

[0016] Arg Xaa ^(H)Xaa Leu Xaa Xaa Trp ^(H)Xaa Glu Xaa Gln Xaa Trp (SEQID NO: 1), where ^(H)Xaa can be either Ile, Leu, Val, Phe, or Tyr. In amore preferred embodiment the N-terminal domain is a STAT-4 N-terminaldomain fragment. In a still more preferred embodiment the STAT-4N-terminal domain fragment contains amino acid residues 2-123 of SEQ IDNO:2 with 5 additional amino acid residues N-terminal to amino acidresidue number 2, i.e., from the N-terminus GLY SER GLY GLY GLY, aminoacid residue 2.

[0017] The present invention also provides methods of screening drugsthat either enhance or inhibit STAT-STAT dimeric interactions. Suchmethods include those that identify drugs that effect the interaction ofN-terminal domains of STAT proteins that are bound to adjacent DNAbinding sites. In one such embodiment, a drug library is screened byassaying the binding activity of a STAT protein to its DNA binding site.This assay is based on the ability of the N-terminal domain of STATproteins to substantially enhance the binding affinity of two adjacentSTAT dimers to a pair of closely aligned DNA binding sites, i.e.,binding sites separated by approximately 10 to 15 base pairs. Such druglibraries include phage libraries as described below, chemical librariescompiled by the major drug manufacturers, mixed libraries, and the like.Any of such compounds contained in the drug libraries are suitable fortesting as a prospective drug in the assays described below, includingin a high throughput assay based on the methods described below.

[0018] An antagonist of the STAT N-terminal dimeric interactionantagonizes one or more aspects of the binding of the STAT dimers toadjacent weak binding sites for the STAT dimers on a promoter of a gene.Such antagonists could be useful as drugs in the treatment of a varietyof disease states, including inflammation, allergy, asthma, andleukemias.

[0019] On the other hand, a drug that acts as an agonist stabilizes theN-terminal dimeric interaction between STAT dimers bound to adjacentweak binding sites for the STAT dimers on a promoter of a gene, therebyenhancing STAT function. Such agonists can be used as drugs that areuseful in the treatment of anemias, neutropenias, thrombocytopenia,cancer, obesity, viral diseases and growth retardation, or otherdiseases characterized by an insufficient STAT activity.

[0020] Therefore the present invention provides a method of using thecrystals of the present invention in drug screening assays. In one suchembodiment the method comprises selecting a potential drug by performingrational drug design with the three-dimensional structure determined forthe crystal. The selecting is preferably performed in conjunction withcomputer modeling. The potential drug is contacted with a dimeric STATprotein N-terminal domain and the binding of the potential drug with theN-terminal domain is detected. A drug is selected which binds to theN-terminal domain of the dimeric STAT protein.

[0021] In a preferred embodiment of this type, contacting the potentialdrug with a dimeric STAT protein N-terminal domain is performed withdimeric STAT protein fragments containing the STAT protein N-terminaldomain. In a more preferred embodiment the STAT protein N-terminaldomain has an amino acid sequence comprising SEQ ID NO: 1. In one suchembodiment the dimeric STAT protein fragment is labeled. In another suchembodiment the dimeric STAT protein fragment is bound to a solidsupport.

[0022] Another method of using a crystal of the present invention in adrug screening assay comprises selecting a potential drug by performingrational drug design with the three-dimensional structure determined forthe crystal. The selecting is preferably performed in conjunction withcomputer modeling. The potential drug is contacted with two or moredimeric STAT proteins in the presence of a nucleic acid containing atleast two adjacent weak binding sites for STAT protein dimers and theeffect of the potential drug on the binding of the dimeric STAT proteinsto each other and/or to the nucleic acid is detected. A potential drugis selected as a candidate drug when it either enhances or diminishesthe binding of the dimeric STAT proteins to each other and/or thenucleic acid. In a preferred embodiment of this type the method furthercomprises growing a supplemental crystal containing a protein-drugcomplex formed between the dimeric N-terminal domain of the STAT proteinand the candidate drug. A crystal is chosen that effectively diffractsX-rays allowing the determination of the atomic coordinates of theprotein-ligand complex to a resolution of greater than 5.0 Angstroms,preferably to a resolution greater than 3.0 Angstroms and morepreferably to a resolution greater than 2.0 Angstroms. Thethree-dimensional structure of the supplemental crystal is determined bymolecular replacement analysis and a drug is selected by performingrational drug design with the three-dimensional structure determined forthe supplemental crystal. The selecting is preferably performed inconjunction with computer modeling.

[0023] The present invention also provides a method for identifying adrug that modulates the ability of adjacent STAT protein dimers tointeract and bind to adjacent DNA binding sites. One such embodimentcomprises selecting a potential drug by performing rational drug designwith the three-dimensional structure determined for a crystal of thepresent invention. The selecting is preferably performed in conjunctionwith computer modeling. The binding affinity of the STAT protein (or ofa fragment thereof that comprises the N-terminal domain) for a nucleicacid comprising two adjacent weak STAT DNA binding sites in the presenceand absence of the potential drug is determined (i.e., measured). Thebinding affinity of the STAT protein (or the fragment) for a nucleicacid comprising a single strong STAT binding site in the presence andabsence of the potential drug is also determined. Next a comparison ismade between the binding affinities of the STAT protein (or thefragment) is measured for the two adjacent weak STAT DNA binding sitesin the presence and absence of the potential drug with that determinedfor the STAT protein (or the fragment) for the single strong STATbinding site in the presence and absence of the potential drug. Apotential drug which causes an increase in the binding affinity measuredfor the two adjacent weak STAT DNA binding sites but not in the bindingaffinity measured for the single strong STAT binding site is identifiedas a drug that enhances the interaction between adjacent activated STATdimers. On the other hand, a potential drug which causes a decrease inthe binding affinity measured for the two adjacent weak STAT DNA bindingsites but not in the binding affinity measured for the single strongSTAT binding site is identified as a drug that inhibits the interactionbetween adjacent activated STAT dimers.

[0024] The present invention further provides a method for identifying adrug that modulates the ability of adjacent STAT protein dimers tointeract and bind to adjacent DNA binding sites which comprisesmeasuring the binding affinity of the STAT protein comprising afunctional N-terminal domain to a nucleic acid comprising two adjacentweak STAT DNA binding sites in the presence and absence of a potentialdrug. The binding affinity of a modified form of the STAT protein thatlacks a functional N-terminal domain is also determined for the nucleicacid in the presence and absence of the potential drug. The bindingaffinity measured for the STAT protein in the presence and absence ofthe potential drug is then compared with the binding affinity measuredfor the modified STAT protein in the presence and absence of thepotential drug. A potential drug which causes an increase in the bindingaffinity measured for the functional STAT protein but not in the bindingaffinity measured for the modified STAT protein is identified as a drugthat enhances the interaction between adjacent activated STAT dimers,and a potential drug which causes a decrease in the binding affinitymeasured for the STAT protein but not in the binding affinity measuredfor the modified STAT protein is identified as a drug that inhibits theinteraction between adjacent activated STAT dimers. Variations of theseassays are performed in in situ assays as described below with reportergenes operably under the control of weak and/or strong STAT bindingsites.

[0025] In one such embodiment the modified STAT protein is a STATprotein that lacks the α4-tryptophan. In another such embodiment themodified Stat protein lacks the α4-glutamic acid. In still anotherembodiment the modified STAT protein is the truncated STAT proteinStat1tc identified in by Vinkemeier et al. [EMBO J. 15:5616 (1996)].

[0026] The present invention further provides a method for identifying adrug that enhances or diminishes the ability of STAT protein dimers toinduce the expression of a gene operably under the control of a promotercontaining at least two adjacent weak binding sites for STAT proteindimers. One such embodiment comprises selecting a potential drug byperforming rational drug design with the three-dimensional structuredetermined for a crystal of the present invention. The selecting ispreferably performed in conjunction with computer modeling. The level ofexpression of a first reporter gene and a second reporter gene containedby a host cell in the presence and absence of the potential drug isdetermined. The first reporter gene is operably linked to a firstpromoter containing at least two adjacent weak binding sites for STATprotein dimers, and the second reporter gene is operably linked to asecond promoter comprising at least one strong binding site for a STATprotein dimer. The binding of STAT protein dimers to the two adjacentweak binding sites induces the expression of the first reporter gene,and the binding of the STAT protein dimer to the strong binding siteinduces the expression of the second reporter gene. In addition the hostcell either naturally contains STAT protein dimers or is modified and/orinduced to contain them. The level of expression of the first reportergene is then compared with that of the second reporter gene in thepresence and absence of the potential drug. When the presence of thepotential drug results in an increase in the level of expression of thefirst reporter gene but not that of the second reporter gene, thepotential drug is identified as a drug that enhances the ability of STATprotein dimers to induce the expression of a gene operably under thecontrol of a promoter containing at least two adjacent weak bindingsites for STAT protein dimers. On the other hand, when the presence of apotential drug results in a decrease in the level of expression of thefirst reporter gene but not that of the second reporter gene, thepotential drug is identified as a drug that inhibits the ability of STATprotein dimers to induce the expression of a gene operably under thecontrol of a promoter containing at least two adjacent weak bindingsites for STAT protein dimers.

[0027] In a preferred embodiment of this type, the method furthercomprises growing a supplemental crystal containing a protein-drugcomplex formed between the dimeric N-terminal domain and the drug. Thecrystal effectively diffracts X-rays allowing the determination of theatomic coordinates of the protein-ligand complex to a resolution ofgreater than 5.0 Angstroms, preferably to a resolution of greater than3.0 Angstroms and more preferably to a resolution of greater than 2.0Angstroms. The three-dimensional structure of the supplemental crystalis then determined with molecular replacement analysis. A drug isselected by performing rational drug design with the three-dimensionalstructure determined for the supplemental crystal. The selecting ispreferably performed in conjunction with computer modeling.

[0028] In an alternative embodiment, the first reporter gene iscontained by a first host cell, and the second reporter gene iscontained by a second host cell. In this case, both the first host celland second host cell contain STAT protein dimers. In one embodiment, theweak STAT binding sites are from sites present in the regulatory regionsof the MIG gene. In another embodiment the weak STAT binding sites arefrom sites present in the regulatory regions of the c-fos gene. In stillanother embodiment the weak STAT binding sites are from sites present inthe regulatory regions of the interferon-γ gene. In a related embodimentthe mutated cfos-promotor element, the M67 site [Wagner et al., EMBO J.9:4477 (1990)] is used as the strong STAT binding site. In anotherembodiment of this type the strong STAT binding site is the S1 site[Horvath et al., Genes & Devel 9:984 (1995)]. In still another suchembodiment strong STAT binding site is obtained from the IRF-1 genepromoter. In preferred embodiments, the host cell or host cells aremammalian cells.

[0029] These and other aspects of the present invention will be betterappreciated by reference to the following drawings and DetailedDescription.

BRIEF DESCRIPTION OF THE DRAWINGS

[0030]FIG. 1A shows a schematic representation of two STAT dimers boundto adjacent target sites. Interactions between N-domains (N, circled)allow the dimers to bind to each other. Phosphotyrosines are indicatedas “Y” with encircled P symbols. DBD: DNA binding domain; SH2: SH2domain; TAD: transactivation domain.

[0031]FIG. 1B shows the sequence alignment of the conserved N-terminaldomain of the STAT family and secondary structure of the N-terminaldomain of STAT-4. Human (hSTAT), murine (mSTAT), and Drosophila (DSTAT)proteins are included. The numbering is according to STAT-4. α-Helicesα1 to α8 are drawn as cylinders. The blackened part of helix α2indicates a 3 helix. Invariant residues are highlighted with an asteriskbelow the alignment. Conserved residues in the hydrophobic core aremarked with filled circles above the STAT-4 sequence. The followingamino acid exchanges are considered conserved: Q/N, D/E, I/LN, R/K/H,Y/F, S/T. Residues in helices α6 and α7 that contribute to the packingof the coiled-coil are boxed and their position in the helical repeatsis indicated (a or d).

[0032]FIG. 2: Tertiary structure of the amino-terminal domain of STAT-4.

[0033]FIG. 2A depicts the overall representation of two monomers (greenand gray) in the crystallographic dimer, viewed approximately orthogonalto the molecular 2-fold axis, which is vertical. The ring-shapedN-terminal element is colored red in one monomer.

[0034]FIG. 2B depicts the orthogonal view of one of the N-terminaldomains shown in FIG. 2A, showing details of the architecture of thering-shaped element. Side chains that participate in a charge stabilizedhydrogen-bond network are shown in a ball-and-stick representation. Theside chain and backbone carbonyl of buried Arg 31 are shown in magenta.For clarity, the indole ring of the invariant residue Trp 4 that sealsoff this arrangement on the proximal side is drawn with thinner bonds.The blue sphere denotes a buried water molecule. Hydrogen-bonds areindicated by dotted lines. Oxygen, nitrogen, and carbon atoms arecolored red, blue and yellow, respectively. Q3-N marks the position ofthe backbone amide group of residue Gln 3. The light red colored segmentof helix α2 highlights its 3₁₀ helical conformation. FIGS. 2, 3B, and 3Cwere created with the program RIBBONS v2.0 [Carson, J. Appl. Cryst.24:958 (1991)].

[0035]FIG. 3A depicts the surface representation of the N-terminaldomain indicating the wedge shaped groove and the dimerizationinterface. Shown are two monomers of a dimer with the left one rotated90° around the vertical axis away from the original position in thedimer. Note the hook-like appearance of the monomer with the coiled-coilof helices α6 and α7 pointing out of the planar surface formed by thering-shaped element comprising the amino-terminal 40 residues. Residuesfrom three separate regions of the N-terminal domain make direct orwater mediated contacts in the dimer and are color-coded according totheir position. Interface residues at the amino-terminus are in green,those in helices α3 and α4 are in blue and amino acids located in helixα6 are yellow. The position of the critical Trp 37 is highlighted inred. The Figure was created using GRASP [Nicholls et al., Proteins.Struct. Funct. and Genetics 11:281 (1991)].

[0036]FIG. 3B shows a view at the dimerization interface with aminoacids represented as ball-and-stick models and the cα backbone asribbon. The monomer is in the same orientation as the one on the rightof panel of FIG. 3A. Side chains are colored according to their positionfollowing the same convention as in FIG. 3A; backbone ribbon is coloredas in FIG. 2B with the first 40 residues highlighted in red. Note theentirely polar nature of the interface. Leu 33 makes a backbone carbonylgroup contact and its position is represented by the filled circle. Inthe STAT-4 recombinant N-terminal domain used for crystallization, Met 1was replaced with Gly plus four additional small amino acids, one ofwhich (Gly-1) is visible in the electron density map (see Methods in theEXAMPLE below). In the crystals the amino terminus of Gly-1 is part ofthe dimer interface, possibly substituting for the native Met 1.

[0037]FIG. 3C shows the close-up stereo view of intermolecular hydrogenbonding network in the dimer. Selected side chains surrounding theconserved Trp 37 (magenta) in helices α4 and α6 of two monomers (greenand grey) are shown. Trp 37 makes direct (E 66′) and water mediatedcontacts (Q 63′). Water molecules are depicted as blue spheres.

[0038]FIG. 4 indicates the importance of the invariant residue Trp 37for STAT-1 tetramerization and mediation of gene activation.

[0039]FIG. 4A depicts the gel mobility shift that shows that a singlepoint mutation disrupts STAT-1 cooperative DNA-binding. Comparison oftetramer stability between wild type (WT) STAT-1 (lanes 3 and 4) and theW37A mutant (lanes 1 and 2). Radiolabeled DNA containing a tandembinding site was preincubated with equal amounts of active protein ofeither tyrosine-phosphorylated WT-STAT or the mutant protein followed bya chase with excess (30 fold) unlabeled oligonucleotide for theindicated amount of time. The wild type STAT-1 shows the expectedformation and stabilization of 2× (dimeric) complexes relative to thedimeric protein/DNA complex. The W37A mutant does form dimericcomplexes, but only very little of the slower migrating 2× (dimeric)species. No increased stability of the 2× (dimeric) complex relative tothe dimer is observed. Samples loaded at the later time point (15minutes) were electrophoresed for shorter times and therefore run higheron the gel. The position of unbound oligonucleotide is marked (free).

[0040]FIG. 4B show the effect of STAT-1 W37A mutation on interferon-γ(IFN-γ) stimulated gene activation in vivo. U3A cells that are lackingSTAT-1 [Miller et al., EMBO J. 12:4221 (1993)] were transfected withexpression clones containing either wild type or mutant STAT-1 (seeMethods of the EXAMPLE, below) along with a luciferase reportercontaining a tandem STAT binding site as an enhancer. After stimulationwith IFN-γ for 10 hours, luciferase expression was determinedspectroscopically (represented as bars). Cells transfected with the wildtype protein show an about 2-fold increase in luciferase expression. Incontrast, the tetramerization deficient W37A mutant does not markedlyincrease transcription over background levels from an enhancer with atandem binding site. Each bar represents ten individual parallelexperiments. Error bars denote standard deviation from the mean.

DETAILED DESCRIPTION OF THE INVENTION

[0041] The present invention provides the first three-dimensionalstructural information regarding the important family of transcriptionfactors known as STATS. More particularly, the present inventionprovides a crystalline form of the N-terminal cooperative domain (moresimply denoted as the N-terminal domain) of a STAT protein of sufficientquality to perform meaningful X-ray crystallographic measurements. Inaddition, the present invention provides a method of preparing suchcrystals.

[0042] In addition the present invention provides a method of using thecrystals and the crystallographic measurements for drug discovery anddevelopment. These methods include procedures for screening drugs thateither enhance or inhibit STAT-STAT dimer interactions, which can have acritical effect on the transcription of the specific genes under thecontrol of STAT proteins. For example, antagonists of the STATN-terminal dimer interaction antagonize STAT functions dependent on thisaspect of STAT behavior. Drugs that are antagonists would be useful forthe treatment of a variety of disease states, including but not limitedto, inflammation, allergy, asthma, and leukemias. On the other hand,drugs that are found to be agonists would stabilize the N-terminalinteraction between STAT dimers, thereby enhancing this aspect of theSTAT function. Such drugs may therefore have utility in the treatment ofanemias, neutropenias, thrombocytopenia, cancer, obesity, viral diseasesand growth retardation, or other diseases characterized by ainsufficient STAT activity.

[0043] Therefore, if appearing herein, the following terms shall havethe definitions set out below.

[0044] As used herein the “α4-tryptophan” is the conserved tryptophancommon in all STAT proteins found in the N-terminal domain in an alphahelix defined as α4 in FIG. 1. For mStat4 the α4-tryptophan is W₃₇ ofthe amino acid sequence. Similarly, the α4-glutamic acid is theconserved glutamic acid common in all STAT proteins found in theN-terminal domain in an alpha helix defined as α4 in FIG. 1. For mStat4the α4 glutamic acid is E₃₉ of the amino acid sequence.

[0045] As used herein a the term “STAT protein” includes a particularfamily of transcription factor consisting of the Signal Transducers andActivators of Transcription proteins. These proteins have been definedin International Patent Publication No.s WO 93/19179 (30 September 1993,by James E. Darnell, Jr. et al.), WO 95/08629 (Mar. 30, 1995, by JamesE. Darnell, Jr. et al.) and United States application having a Ser. No.08/212,184, filed on Mar. 11, 1994, entitled, “Interferon AssociatedReceptor Recognition Factors, Nucleic Acids Encoding the Same andMethods of Use Thereof” by James E. Darnell, Jr. et al., all of whichare incorporated by reference in their entireties, herein. Currently,there are seven STAT family members which have been identified, numberedSTAT 1, 2, 3, 4, 5A, 5B, and 6. STAT proteins include proteins derivedfrom alternative splice sites such as Human STAT1α and STAT1β, i.e.,STAT1β is a shorter protein than STAT1α and is translated from analternatively spliced mRNA. Modified STAT proteins and functionalfragments of STAT proteins are included in the present invention.

[0046] As used herein the terms “phosphorylated” and “nonphosphorylated”as used in conjunction with or in reference to a STAT protein denote thephosphorylation state of a particular tyrosine residue of the STATproteins (e.g., Tyr 701 of STAT1). When STAT proteins arephosphorylated, they form homo- or heterodimeric structures in which thephosphotyrosine of one partner binds to the SRC homology domain (SH2) ofthe other. In their natural environment the newly formed dimer thentranslocates from the cytoplasm to the nucleus, binds to a palindromicGAS sequence, thereby activating transcription

[0047] The “N-terminal domain” of a STAT protein is used interchangeablyherein with the “N-terminal cooperative domain” and refers to theN-terminal portion of a STAT protein involved in STAT proteindimer-dimer interaction at a weak STAT DNA binding site. Preferably theamino acid of the N-terminal domain comprises SEQ ID NO: 1. In oneparticular embodiment the STAT protein is STAT-4 comprising amino acids2-123 of SEQ ID NO:2.

[0048] General Techniques for Constructing Nucleic Acids that ExpressRecombinant STAT Proteins

[0049] In accordance with the present invention there may be employedconventional molecular biology, microbiology, and recombinant DNAtechniques within the skill of the art. Such techniques are explainedfully in the literature. See, e.g., Sambrook, Fritsch & Maniatis,Molecular Cloning: A Laboratory Manual, Second Edition (1989) ColdSpring Harbor Laboratory Press, Cold Spring Harbor, N.Y. (herein“Sambrook et al., 1989”); DNA Cloning. A Practical Approach, Volumes Iand II (D. N. Glover ed. 1985); Oligonucleotide Synthesis (M. J. Gaited. 1984); Nucleic Acid Hybridization [B. D. Hames & S. J. Higgins eds.(1985)]; Transcription And Translation [B. D. Hames & S. J. Higgins,eds. (1984)]; Animal Cell Culture [R. I. Freshney, ed. (1986)];Immobilized Cells And Enzymes [IRL Press, (1986)]; B. Perbal, APractical Guide To Molecular Cloning (1984); F. M. Ausubel et al.(eds.), Current Protocols in Molecular Biology, John Wiley & Sons, Inc.(1994).

[0050] Therefore, if appearing herein, the following terms shall havethe definitions set out below.

[0051] As used herein, the term “gene” refers to an assembly ofnucleotides that encode a polypeptide, and includes cDNA and genomic DNAnucleic acids.

[0052] A “vector” is a replicon, such as plasmid, phage or cosmid, towhich another DNA segment may be attached so as to bring about thereplication of the attached segment. A “replicon” is any genetic element(e.g., plasmid, chromosome, virus) that functions as an autonomous unitof DNA replication in vivo, i.e., capable of replication under its owncontrol.

[0053] A “cassette” refers to a segment of DNA that can be inserted intoa vector at specific restriction sites. The segment of DNA encodes apolypeptide of interest, and the cassette and restriction sites aredesigned to ensure insertion of the cassette in the proper reading framefor transcription and translation.

[0054] A cell has been “transfected” by exogenous or heterologous DNAwhen such DNA has been introduced inside the cell.

[0055] A “nucleic acid molecule” refers to the phosphate ester polymericform of ribonucleosides (adenosine, guanosine, uridine or cytidine; “RNAmolecules”) or deoxyribonucleosides (deoxyadenosine, deoxyguanosine,deoxythymidine, or deoxycytidine; “DNA molecules”), or any phosphoesteranalogues thereof, such as phosphorothioates and thloesters, in eithersingle stranded form, or a double-stranded helix. Double strandedDNA-DNA, DNA-RNA and RNA-RNA helices are possible. The term nucleic acidmolecule, and in particular DNA or RNA molecule, refers only to theprimary and secondary structure of the molecule, and does not limit itto any particular tertiary forms. Thus, this term includesdouble-stranded DNA found, inter alia, in linear or circular DNAmolecules (e.g., restriction fragments), plasmids, and chromosomes. Indiscussing the structure of particular double-stranded DNA molecules,sequences may be described herein according to the normal convention ofgiving only the sequence in the 5′ to 3′ direction along thenontranscribed strand of DNA (i.e., the strand having a sequencehomologous to the mRNA). A “recombinant DNA molecule” is a DNA moleculethat has undergone a molecular biological manipulation.

[0056] A nucleic acid molecule is “hybridizable” to another nucleic acidmolecule, such as a cDNA, genomic DNA, or RNA, when a single strandedform of the nucleic acid molecule can anneal to the other nucleic acidmolecule under the appropriate conditions of temperature and solutionionic strength (see Sambrook et al., supra). The conditions oftemperature and ionic strength determine the “stringency” of thehybridization. For preliminary screening for homologous nucleic acids,low stringency hybridization conditions, corresponding to a T_(m) of55°, can be used, e.g., 5× SSC, 0.1% SDS, 0.25% milk, and no formamide;or 30% formamide, 5× SSC, 0.5% SDS). Moderate stringency hybridizationconditions correspond to a higher T_(m), e.g., 40% formamide, with 5x or6× SCC. High stringency hybridization conditions correspond to thehighest T_(m), e.g., 50% formamide, 5× or 6× SCC. Hybridization requiresthat the two nucleic acids contain complementary sequences, althoughdepending on the stringency of the hybridization, mismatches betweenbases are possible. The appropriate stringency for hybridizing nucleicacids depends on the length of the nucleic acids and the degree ofcomplementation, variables well known in the art. The greater the degreeof similarity or homology between two nucleotide sequences, the greaterthe value of T_(m) for hybrids of nucleic acids having those sequences.The relative stability (corresponding to higher T_(m)) of nucleic acidhybridizations decreases in the following order: RNA:RNA, DNA:RNA,DNA:DNA. For hybrids of greater than 100 nucleotides in length,equations for calculating T_(m) have been derived (see Sambrook et al.,supra, 9.50-0.51). For hybridization with shorter nucleic acids, i.e.,oligonucleotides, the position of mismatches becomes more important, andthe length of the oligonucleotide determines its specificity (seeSambrook et al., supra, 11.7-11.8). Preferably a minimum length for ahybridizable nucleic acid is at least about 12 nucleotides; preferablyat least about 18 nucleotides; and more preferably the length is atleast about 27 nucleotides; and most preferably 36 nucleotides.

[0057] In a specific embodiment, the term “standard hybridizationconditions” refers to a T_(m) of 55° C., and utilizes conditions as setforth above. In a preferred embodiment, the T_(m) IS 60° C.; in a morepreferred embodiment, the T_(m) is 65° C.

[0058] A DNA “coding sequence” is a double-stranded DNA sequence whichis transcribed and translated into a polypeptide in a cell in vitro orin vivo when placed under the control of appropriate regulatorysequences. The boundaries of the coding sequence are determined by astart codon at the 5′ (amino) terminus and a translation stop codon atthe 3′ (carboxyl) terminus. A coding sequence can include, but is notlimited to, prokaryotic sequences and synthetic DNA sequences. If thecoding sequence is intended for expression in a eukaryotic cell, apolyadenylation signal and transcription termination sequence willusually be located 3′ to the coding sequence.

[0059] Transcriptional and translational control sequences are DNAregulatory sequences, such as promoters, enhancers, terminators, and thelike, that provide for the expression of a coding sequence in a hostcell. In eukaryotic cells, polyadenylation signals are controlsequences.

[0060] A “promoter sequence” is a DNA regulatory region capable ofbinding RNA polymerase in a cell and initiating transcription of adownstream (3′ direction) coding sequence. For purposes of defining thepresent invention, the promoter sequence is bounded at its 3′ terminusby the transcription initiation site and extends upstream (5′ direction)to include the minimum number of bases or elements necessary to initiatetranscription at levels detectable above background. Within the promotersequence will be found a transcription initiation site (convenientlydefined for example, by mapping with nuclease S 1), as well as proteinbinding domains (consensus sequences) responsible for the binding of RNApolymerase.

[0061] A coding sequence is “under the control” of transcriptional andtranslational control sequences in a cell when RNA polymerasetranscribes the coding sequence into mRNA, which is then trans-RNAspliced and translated into the protein encoded by the coding sequence.

[0062] As used herein, the term “homologous” in all its grammaticalforms refers to the relationship between proteins that possess a “commonevolutionary origin,” including proteins from superfamilies (e.g., theimmunoglobulin superfamily) and homologous proteins from differentspecies (e.g., myosin light chain, etc.) (Reeck et al., 1987, Cell50:667). Such proteins have sequence homology as reflected by their highdegree of sequence similarity.

[0063] Accordingly, the term “sequence similarity” in all itsgrammatical forms refers to the degree of identity or correspondencebetween nucleic acid or amino acid sequences of proteins that may or maynot share a common evolutionary origin (see Reeck et al., supra).However, in common usage and in the instant application, the term“homologous,” when modified with an adverb such as “highly,” may referto sequence similarity and not a common evolutionary origin.

[0064] The term “corresponding to” is used herein to refer similar orhomologous sequences, whether the exact position is identical ordifferent from the molecule to which the similarity or homology ismeasured. Thus, the term “corresponding to” refers to the sequencesimilarity, and not the numbering of the amino acid residues ornucleotide bases.

[0065] A gene encoding a STAT protein, whether genomic DNA or cDNA, canbe isolated from any animal source, particularly from a mammal. Methodsfor obtaining the STAT protein gene are well known in the art, asdescribed above (see, e.g., Sambrook et al., 1989, supra).

[0066] A “heterologous nucleotide sequence” as used herein is anucleotide sequence that is added to a nucleotide sequence of thepresent invention by recombinant methods to form a nucleic acid which isnot naturally formed in nature. Such nucleic acids can encode chimericand/or fusion proteins. Thus the heterologous nucleotide sequence canencode peptides and/or proteins which contain regulatory and/orstructural properties. In another such embodiment the heterologousnucleotide can encode a protein or peptide that functions as a means ofdetecting the protein or peptide encoded by the nucleotide sequence ofthe present invention after the recombinant nucleic acid is expressed.In still another such embodiment the heterologous nucleotide canfunction as a means of detecting a nucleotide sequence of the presentinvention. A heterologous nucleotide sequence can comprise non-codingsequences including restriction sites, regulatory sites, promoters andthe like.

[0067] The present invention also relates to cloning vectors containinggenes encoding analogs and derivatives of the STAT protein, includingmodified STAT proteins of the invention, that have the same orhomologous functional activity as STAT protein, and homologs thereof.The production and use of derivatives and analogs related to the STATprotein are within the scope of the present invention.

[0068] STAT protein derivatives and analogs as described above can bemade by altering encoding nucleic acid sequences by substitutions, e.g.replacing the α4-tryptophan or α4-glutamic acid with an alanine, oradditions or deletions that provide for functionally equivalent orspecifically modified molecules.

[0069] Due to the degeneracy of nucleotide coding sequences, other DNAsequences which encode substantially the same amino acid sequence as anucleic acid encoding a modified STAT protein or an N-terminal STATprotein fragment of the present invention (including the fragment whichlacks the α4-tryptophan) may be used in the practice of the presentinvention. These include but are not limited to allelic genes,homologous genes from other species, which are altered by thesubstitution of different codons that encode the same amino acid residuewithin the sequence, thus producing a silent change. Likewise, themodified STAT protein derivatives of the invention include, but are notlimited to, those containing, as a primary amino acid sequence, all orpart of the amino acid sequence of a STAT protein including alteredsequences in which functionally equivalent amino acid residues aresubstituted for residues within the sequence resulting in a conservativeamino acid substitution. For example, one or more amino acid residueswithin the sequence can be substituted by another amino acid of asimilar polarity, which acts as a functional equivalent, resulting in asilent alteration. Substitutes for an amino acid within the sequence maybe selected from other members of the class to which the amino acidbelongs. For example, the nonpolar (hydrophobic) amino acids includealanine, leucine, isoleucine, valine, proline, phenylalanine, tryptophanand methionine. Amino acids containing aromatic ring structures arephenylalanine, tryptophan, and tyrosine. The polar neutral amino acidsinclude glycine, serine, threonine, cysteine, tyrosine, asparagine, andglutamine. The positively charged (basic) amino acids include arginine,lysine and histidine. The negatively charged (acidic) amino acidsinclude aspartic acid and glutamic acid.

[0070] Particularly preferred conserved amino acid exchanges are:

[0071] (a) Lys for His or for Arg or vice versa such that a positivecharge may be maintained;

[0072] (b) Glu for Asp or vice versa such that a negative charge may bemaintained;

[0073] (c) Ser for Thr or vice versa such that a free —OH can bemaintained;

[0074] (d) Gln for Asn or vice versa such that a free NH2 can bemaintained;

[0075] (e) lie for Leu or for Val or vice versa as roughly equivalenthydrophobic amino acids; and

[0076] (f) Phe for Tyr or vice versa as roughly equivalent aromaticamino acids.

[0077] Non-conserved amino acid substitutions may also be introduced tosubstitute an amino acid with a particularly preferable property. Forexample, a Cys may be introduced to provide a potential site fordisulfide bridges with another Cys. A His may be introduced as aparticular “catalytic” site (i.e., His can act as an acid or base and isthe most common amino acid in biochemical catalysis). Pro may beintroduced because of its particularly planar structure, which inducesβ-turns in the protein's structure.

[0078] The genes encoding STAT proteins, and derivatives and analogsthereof can be produced by various methods known in the art. Themanipulations which result in their production can occur at the gene orprotein level. For example, the cloned N-terminal domain of a STATprotein gene sequence can be modified by any of numerous strategiesknown in the art (Sambrook et al., 1989, supra). The sequence can becleaved at appropriate sites with restriction endonuclease(s), followedby further enzymatic modification if desired, isolated, and ligated invitro. In the production of the gene encoding a derivative or analog ofa STAT protein care should be taken to ensure that the modified generemains within the same translational reading frame as the STAT proteingene, uninterrupted by translational stop signals, in the gene regionwhere the desired activity is encoded.

[0079] Additionally, the STAT protein-encoding nucleic acid sequence canbe mutated in vitro or in vivo, to create and/or destroy translation,initiation, and/or termination sequences, or to create variations incoding regions and/or form new restriction endonuclease sites or destroypreexisting ones, to facilitate further in vitro modification. Anytechnique for mutagenesis known in the art can be used, including butnot limited to, in vitro site-directed mutagenesis (Hutchinson, C., etal., 1978, J. Biol. Chem. 253:6551; Zoller and Smith, 1984, DNA3:479-488; Oliphant et al., 1986, Gene 44:177; Hutchinson et al., 1986,Proc. Natl. Acad. Sci. U.S.A. 83:710), use of TAB® linkers (Pharmacia),etc. PCR techniques are preferred for site directed mutagenesis (seeHiguchi, 1989, “Using PCR to Engineer DNA”, in PCR Technology:Principles and Applications for DNA Amplification, H. Erlich, ed.,Stockton Press, Chapter 6, pp. 61-70).

[0080] The identified and isolated gene can then be inserted into anappropriate cloning vector. A large number of vector-host systems knownin the art may be used. Possible vectors include, but are not limitedto, plasmids or modified viruses, but the vector system must becompatible with the host cell used. Examples of vectors include, but arenot limited to, E. coli, bacteriophages such as lambda derivatives, orplasmids such as pBR322 derivatives or pUC plasmid derivatives, e.g.,pGEX vectors, pmal-c, pFLAG, etc. The insertion into a cloning vectorcan, for example, be accomplished by ligating the DNA fragment into acloning vector which has complementary cohesive termini. However, if thecomplementary restriction sites used to fragment the DNA are not presentin the cloning vector, the ends of the DNA molecules may beenzymatically modified. Alternatively, any site desired may be producedby ligating nucleotide sequences (linkers) onto the DNA termini theseligated linkers may comprise specific chemically synthesizedoligonucleotides encoding restriction endonuclease recognitionsequences. Recombinant molecules can be introduced into host cells viatransformation, transfection, infection, electroporation, etc., so thatmany copies of the gene sequence are generated. Preferably, the clonedgene is contained on a shuttle vector plasmid, which provides forexpansion in a cloning cell, e.g., E. coli, and facile purification forsubsequent insertion into an appropriate expression cell line, if suchis desired. For example, a shuttle vector, which is a vector that canreplicate in more than one type of organism, can be prepared forreplication in both E. coli and Saccharomyces cerevisiae by linkingsequences from an E. coli plasmid with sequences form the yeast 2 μplasmid.

[0081] In an alternative method, the desired gene may be identified andisolated after insertion into a suitable cloning vector in a “shot gun”approach. Enrichment for the desired gene, for example, by sizefractionation, can be done before insertion into the cloning vector.

Expression of STAT Proteins

[0082] The nucleotide sequence coding for a STAT protein, or functionalfragment, including the N-terminal peptide fragment of a STAT protein,derivatives or analogs thereof, including a chimeric protein, thereof,can be inserted into an appropriate expression vector, i.e., a vectorwhich contains the necessary elements for the transcription andtranslation of the inserted protein-coding sequence. Such elements aretermed herein a “promoter.” Thus, the nucleic acid encoding a STATprotein of the invention or functional fragment, including theN-terminal peptide fragment of a STAT protein, derivatives or analogsthereof, is operationally associated with a promoter in an expressionvector of the invention. Both cDNA and genomic sequences can be clonedand expressed under control of such regulatory sequences. An expressionvector also preferably includes a replication origin. The necessarytranscriptional and translational signals can be provided on arecombinant expression vector. As detailed below, all geneticmanipulations described for the STAT gene in this section, may also beemployed for genes encoding a functional fragment, including theN-terminal domain peptide fragment of a STAT protein, derivatives oranalogs thereof, including a chimeric protein, thereof.

[0083] Potential host-vector systems include but are not limited tomammalian cell systems infected with virus (e.g., vaccinia virus,adenovirus, etc.); insect cell systems infected with virus (e.g.,baculovirus); microorganisms such as yeast containing yeast vectors; orbacteria transformed with bacteriophage, DNA, plasmid DNA, or cosmidDNA. The expression elements of vectors vary in their strengths andspecificities. Depending on the host-vector system utilized, any one ofa number of suitable transcription and translation elements may be used.

[0084] A recombinant STAT protein of the invention, may be expressedchromosomally, after integration of the coding sequence byrecombination. In this regard, any of a number of amplification systemsmay be used to achieve high levels of stable gene expression (SeeSambrook et al., 1989, supra).

[0085] The cell into which the recombinant vector comprising the nucleicacid encoding STAT protein is cultured in an appropriate cell culturemedium under conditions that provide for expression of STAT protein bythe cell.

[0086] Any of the methods previously described for the insertion of DNAfragments into a cloning vector may be used to construct expressionvectors containing a gene consisting of appropriatetranscriptional/translational control signals and the protein codingsequences. These methods may include in vitro recombinant DNA andsynthetic techniques and in vivo recombination (genetic recombination).

[0087] Expression of STAT protein may be controlled by anypromoter/enhancer element known in the art, but these regulatoryelements must be functional in the host selected for expression.

[0088] Expression vectors containing a nucleic acid encoding a STATprotein of the invention can be identified by four general approaches:(a) PCR amplification of the desired plasmid DNA or specific mRNA, (b)nucleic acid hybridization, (c) presence or absence of selection markergene functions, and (d) expression of inserted sequences. In the firstapproach, the nucleic acids can be amplified by PCR to provide fordetection of the amplified product. In the second approach, the presenceof a foreign gene inserted in an expression vector can be detected bynucleic acid hybridization using probes comprising sequences that arehomologous to an inserted marker gene. In the third approach, therecombinant vector/host system can be identified and selected based uponthe presence or absence of certain “selection marker” gene functions(e.g., β-galactosidase activity, thymidine kinase activity, resistanceto antibiotics, transformation phenotype, occlusion body formation inbaculovirus, etc.) caused by the insertion of foreign genes in thevector. In another example, if the nucleic acid encoding STAT protein isinserted within the “selection marker” gene sequence of the vector,recombinants containing the STAT protein insert can be identified by theabsence of the STAT protein gene function. In the fourth approach,recombinant expression vectors can be identified by assaying for theactivity, biochemical, or immunological characteristics of the geneproduct expressed by the recombinant, provided that the expressedprotein assumes a functionally active conformation.

[0089] A wide variety of host/expression vector combinations may beemployed in expressing the DNA sequences of this invention. Usefulexpression vectors, for example, may consist of segments of chromosomal,nonchromosomal and synthetic DNA sequences. Suitable vectors includederivatives of SV40 and known bacterial plasmids, e.g., E. coli plasmidscol E1, pCR1, pBR322, pMal-C2, pET, pGEX (Smith et al., 1988, Gene67:31-40), pMB9 and their derivatives, plasmids such as RP4; phage DNAS,e.g., the numerous derivatives of phage λ, e.g., NM989, and other phageDNA, e.g., M13 and filamentous single stranded phage DNA; yeast plasmidssuch as the 2 μ plasmid or derivatives thereof; vectors useful ineukaryotic cells, such as vectors useful in insect or mammalian cells;vectors derived from combinations of plasmids and phage DNAs, such asplasmids that have been modified to employ phage DNA or other expressioncontrol sequences; and the like.

[0090] For example, in a baculovirus expression systems, both non-fusiontransfer vectors, such as but not limited to pVL941 (BamH1 cloning site;Summers), pVL1393 (BamH1, SmaI, XhaI, EcoR1, NotI, XmaIII, BglII, andPstI cloning site; Invitrogen), pVL1392 (BglII, PstI, NotI, XmaIII,EcoRI, XbaI, SmaI, and BamH1 cloning site; Summers and Invitrogen), andpBlueBacIII (BamH1, BglII, PstI, NcoI, and HindIII cloning site, withblue/white recombinant screening possible; Invitrogen), and fusiontransfer vectors, such as but not limited to pAc70O (BamH1 and KpnIcloning site, in which the BamH1 recognition site begins with theinitiation codon; Summers), pAc701 and pAc702 (same as pAc700, withdifferent reading frames), pAc360 (BamH1 cloning site 36 base pairsdownstream of a polyhedron initiation codon; Invitrogen(195)), andpBlueBacHisA, B, C (three different reading frames, with BamH1, BglII,PstI, NcoI, and HindIII cloning site, an N-terminal peptide for ProBondpurification, and blue/white recombinant screening of plaques;Invitrogen (220)) can be used. Mammalian expression vectors contemplatedfor use in the invention include vectors with inducible promoters, suchas the dihydrofolate reductase (DHFR) promoter, e.g., any expressionvector with a DHFR expression vector, or a DHFR/methotrexateco-amplification vector, such as pED (PstI, SalI, SbaI, SmaI, and EcoRIcloning site, with the vector expressing both the cloned gene and DHFR;see Kaufman, Current Protocols in Molecular Biology, 16.12 (1991).Alternatively, a glutamine synthetase/methionine sulfoximineco-amplification vector, such as pEE14 (HindIII, XbaI, SmaI, SbaI,EcoRI, and BclI cloning site, in which the vector expresses glutaminesynthase and the cloned gene; Celitech). In another embodiment, a vectorthat directs episomal expression under control of Epstein Barr Virus(EBV) can be used, such as pREP4 (BamH1, SfiI, XhoI, NotI, NheI,HindIII, NheI, PvuII, and KpnI cloning site, constitutive RSV-LTRpromoter, hygromycin selectable marker; Invitrogen), pCEP4 (BamH1, SfiI,XhoI, NotI, NheI, HindIII, NheI, PvuII, and KpnI cloning site,constitutive hCMV immediate early gene, hygromycin selectable marker;Invitrogen), pMEP4 (KpnI, PvuI, NheI, HindIII, NotI, XhoI, SfiI, BamH1cloning site, inducible methallothionein IIa gene promoter, hygromycinselectable marker: Invitrogen), pREP8 (BamH1, XhoI, NotI, HindIII, NheI,and KpnI cloning site, RSV-LTR promoter, histidinol selectable marker;Invitrogen), pREP9 (KpnI, NheI, HindIII, NotI, XhoI, SfiI, and BamHIcloning site, RSV-LTR promoter, G418 selectable marker; Invitrogen), andpEBVH is (RSV-LTR promoter, hygromycin selectable marker, N-terminalpeptide purifiable via ProBond resin and cleaved by enterokinase;Invitrogen). Selectable mammalian expression vectors for use in theinvention include pRc/CMV (HindIII, BstXI, NotI, SbaI, and ApaI cloningsite, G418 selection; Invitrogen), pRc/RSV (HindIII, SpeI, BstXI, NotI,XbaI cloning site, G418 selection; Invitrogen), and others. Vacciniavirus mammalian expression vectors (see, Kaufman, 1991, supra) for useaccording to the invention include but are not limited to pSCI 1 (SmaIcloning site, TK- and β-gal selection), pMJ601 (SalI, SmaI, AflI, NarI,BspMII, BamH1, ApaI, NheI, SacII, KpnI, and HindIII cloning site; TK-and β-gal selection), and pTKgptF1S (EcoRI, PstI, SalI, AccI, HindII,SbaI, BamH1, and Hpa cloning site, TK or XPRT selection).

[0091] Yeast expression systems can also be used according to theinvention to express OB polypeptide. For example, the non-fusion pYES2vector (XbaI, SphI, ShoI, NotI, GstXI, EcoRI, BstXI, BamH1, SacI, Kpn1,and HindIII cloning sit; Invitrogen) or the fusion pYESH is A, B, C(XbaI, SphI, ShoI, NotI, BstXI, EcoRI, BamH1, SacI, KpnI, and HindIIIcloning site, N-terminal peptide purified with ProBond resin and cleavedwith enterokinase; Invitrogen), to mention just two, can be employedaccording to the present invention.

[0092] Once a particular recombinant DNA molecule is identified andisolated, several methods known in the art may be used to propagate it.Once a suitable host system and growth conditions are established,recombinant expression vectors can be propagated and prepared inquantity. As previously explained, the expression vectors which can beused include, but are not limited to, the following vectors or theirderivatives: human or animal viruses such as vaccinia virus oradenovirus; insect viruses such as baculovirus; yeast vectors;bacteriophage vectors (e.g., lambda), and plasmid and cosmid DNAvectors, to name but a few.

[0093] Vectors are introduced into the desired host cells by methodsknown in the art, e g., transfection, electroporation, microinjection,transduction, cell fusion, DEAE dextran, calcium phosphateprecipitation, lipofection (lysosome fusion), use of a gene gun, or aDNA vector transporter (see, e.g., Wu et al., 1992, J. Biol. Chem.267:963-967; Wu and Wu, 1988, J. Biol. Chem. 263:14621-14624; Hartmut etal., Canadian Patent Application No. 2,012,311, filed Mar. 15, 1990).

Synthetic Polypeptides

[0094] The term “polypeptide” is used in its broadest sense to refer toa compound of two or more subunit amino acids, amino acid analogs, orpeptidomimetics. The subunits are linked by peptide bonds. The STATproteins and more particularly the N-terminal domain fragments thereof,of the present invention may be chemically synthesized.

[0095] More particularly, potential drugs that may be tested in the drugscreening assays of the present invention may also be chemicallysynthesized. Synthetic polypeptides, prepared using the well knowntechniques of solid phase, liquid phase, or peptide condensationtechniques, or any combination thereof, can include natural andunnatural amino acids. Amino acids used for peptide synthesis may bestandard Boc (N^(α)-amino protected N^(α)-t-butyloxycarbonyl) amino acidresin with the standard deprotecting, neutralization, coupling and washprotocols of the original solid phase procedure of Merrifield (1963, J.Am. Chem. Soc. 85:2149-2154), or the base-labile N^(α)-amino protected9-fluorenylmethoxycarbonyl (Fmoc) amino acids first described by Carpinoand Han (1972, J. Org. Chem. 37:3403-3409). Both Fmoc and BocN^(α)-amino protected amino acids can be obtained from Fluka, Bachem,Advanced Chemtech, Sigma, Cambridge Research Biochemical, Bachem, orPeninsula Labs or other chemical companies familiar to those whopractice this art. In addition, the method of the invention can be usedwith other N^(α)-protecting groups that are familiar to those skilled inthis art. Solid phase peptide synthesis may be accomplished bytechniques familiar to those in the art and provided, for example, inStewart and Young, 1984, Solid Phase Synthesis, Second Edition, PierceChemical Co., Rockford, Ill.; Fields and Noble, 1990, Int. J. Pept.Protein Res. 35:161-214, or using automated synthesizers, such as soldby ABS. Thus, polypeptides of the invention may comprise D-amino acids,a combination of D- and L-amino acids, and various “designer” aminoacids (e.g., β-methyl amino acids, Cα-methyl amino acids, and Nα-methylamino acids, etc.) to convey special properties. Synthetic amino acidsinclude ornithine for lysine, fluorophenylalanine for phenylalanine, andnorleucine for leucine or isoleucine. Additionally, by assigningspecific amino acids at specific coupling steps, α-helices, β turns, βsheets, γ-turns, and cyclic peptides can be generated.

[0096] In a further embodiment, subunits of peptides that confer usefulchemical and structural properties will be chosen. For example, peptidescomprising D-amino acids will be resistant to L-amino acid-specificproteases in vivo. In addition, the present invention envisionspreparing peptides that have more well defined structural properties,and the use of peptidomimetics, and peptidomimetic bonds, such as esterbonds, to prepare peptides with novel properties. In another embodiment,a peptide may be generated that incorporates a reduced peptide bond,i.e., R₁—CH₂—NH—R₂, where R₁ and R₂ are amino acid residues orsequences. A reduced peptide bond may be introduced as a dipeptidesubunit. Such a molecule would be resistant to peptide bond hydrolysis,e.g., protease activity. Such peptides would provide ligands with uniquefunction and activity, such as extended half-lives in vivo due toresistance to metabolic breakdown, or protease activity. Furthermore, itis well known that in certain systems constrained peptides show enhancedfunctional activity (Hruby, 1982, Life Sciences 31:189-199; Hruby etal., 1990, Biochem J. 268:249-262); the present invention provides amethod to produce a constrained peptide that incorporates randomsequences at all other positions.

[0097] Constrained and Cyclic Peptides.

[0098] A constrained, cyclic or rigidized peptide may be preparedsynthetically, provided that in at least two positions in the sequenceof the peptide an amino acid or amino acid analog is inserted thatprovides a chemical functional group capable of crosslinking toconstrain, cyclise or rigidize the peptide after treatment to form thecrosslink. Cyclization will be favored when a turn-inducing amino acidis incorporated. Examples of amino acids capable of crosslinking apeptide are cysteine to form disulfides, aspartic acid to form a lactoneor a lactam, and a chelator such as γ-carboxyl-glutamic acid (Gla)(Bachem) to chelate a transition metal and form a cross-link. Protectedγ-carboxyl glutamic acid may be prepared by modifying the synthesisdescribed by Zee-Cheng and Olson (1980, Biophys. Biochem. Res. Commun.94:1128-1132). A peptide in which the peptide sequence comprises atleast two amino acids capable of crosslinking may be treated, e.g., byoxidation of cysteine residues to form a disulfide or addition of ametal ion to form a chelate, so as to crosslink the peptide and form aconstrained, cyclic or rigidized peptide.

[0099] The present invention provides strategies to systematicallyprepare cross-links. For example, if four cysteine residues areincorporated in the peptide sequence, different protecting groups may beused (Hiskey, 1981, in The Peptides: Analysis, Synthesis, Biology, Vol.3, Gross and Meienhofer, eds., Academic Press: New York, pp. 137-167;Ponsanti et al., 1990, Tetrahedron 46:8255-8266). The first pair ofcysteines may be deprotected and oxidized, then the second set may bedeprotected and oxidized. In this way a defined set of disulfidecross-links may be formed. Alternatively, a pair of cysteines and a pairof chelating amino acid analogs may be incorporated so that thecross-links are of a different chemical nature.

[0100] Non-Classical Amino Acids that Induce Conformational Constraints.

[0101] The following non-classical amino acids may be incorporated inthe peptide in order to introduce particular conformational motifs:1,2,3,4-tetrahydroisoquinoline-3-carboxylate (Kazmierski et al., 1991,J. Am. Chem. Soc. 113:2275-2283); (2S,3S)-methyl-phenylalanine,(2S,3R)-methyl-phenylalanine, (2R,3 S)-methyl-phenylalanine and (2R,3R)-methyl-phenylalanine (Kazmierski and Hruby, 1991, Tetrahedron Lett.);2-aminotetrahydronaphthalene-2-carboxylic acid (Landis, 1989, Ph.D.Thesis, University of Arizona);hydroxy-1,2,3,4-tetrahydroisoquinoline-3-carboxylate (Miyake et al.,1989, J. Takeda Res. Labs. 43:53-76); β-carboline (D and L) (Kazmierski,1988, Ph.D. Thesis, University of Arizona); HIC (histidine isoquinolinecarboxylic acid) (Zechel et al., 1991, Int. J. Pep. Protein Res. 43);and HIC (histidine cyclic urea) (Dharanipragada).

[0102] The following amino acid analogs and peptidomimetics may beincorporated into a peptide to induce or favor specific secondarystructures: LL-Acp (LL-3-amino-2-propenidone-6-carboxylic acid), aβ-turn inducing dipeptide analog (Kemp et al., 1985, J. Org. Chem.50:5834-5838); β-sheet inducing analogs (Kemp et al., 1988, TetrahedronLett. 29:5081-5082); β-turn inducing analogs (Kemp et al., 1988,Tetrahedron Lett. 29:5057-5060); α-helix inducing analogs (Kemp et al.,1988, Tetrahedron Lett. 29:4935-4938); y-turn inducing analogs (Kemp etal., 1989, J. Org. Chem. 54:109:115); and analogs provided by thefollowing references: Nagai and Sato, 1985, Tetrahedron Lett.26:647-650; DiMaio et al., 1989, J. Chem. Soc. Perkin Trans. p. 1687;also a Gly-Ala turn analog (Kahn et al., 1989, Tetrahedron Lett.30:2317); amide bond isostere (Jones et al., 1988, Tetrahedron Lett.29:3853-3856); tretrazol (Zabrocki et al., 1988, J. Am. Chem. Soc.110:5875-5880); DTC (Samanen et al., 1990, Int. J. Protein Pep. Res.35:501:509); and analogs taught in Olson et al., 1990, J. Am. Chem. Sci.112:323-333 and Garvey et al., 1990, J. Org. Chem. 56:436.Conformationally restricted mimetics of beta turns and beta bulges, andpeptides containing them, are described in U.S. Pat. No. 5,440,013,issued Aug. 8, 1995 to Kahn.

Crystals of the N-terminal Domain Fragment of a STAT Protein

[0103] Crystals of the N-terminal domain fragment of a STAT protein canbe grown by a number of techniques including batch crystallization,vapor diffusion (either by sitting drop or hanging drop) and bymicrodialysis. Seeding of the crystals in some instances is required toobtain X-ray quality crystals. Standard micro and/or macro seeding ofcrystals may therefore be used. As exemplified below hanging drops ofone μl of the STAT-4 N-terminal domain fragment (20 mg/ml, in 50 mMHepes/HCl pH 8.0, 150 mM KCl, 2.5 mM CaCl₂, 5 mM DTT) was mixed withequal volumes of reservoir buffer containing 0.2 M Na⁺CH₃COO⁻, 0.1 MTris/HCL pH 8.0, 17% PEG4000. Hexagonal crystals (0.2×0.2×0.2 mm) wereroutinely grown overnight at 20° C. Once a crystal of the presentinvention is grown, X-ray diffraction data can be collected. The Examplebelow used a MAR imaging plate detector for X-ray diffraction datacollection though alternative methods may also be used. For example,crystals can be characterized by using X-rays produced in a conventionalsource (such as a sealed tube or a rotating anode) or using asynchrotron source. Methods of characterization include, but are notlimited to, precision photography, oscillation photography anddiffractometer data collection. Heavy atom derivatives such as producedwith a mercurial, exemplified below, can be performed using Fuji imagingplates. Alternatively, the STAT fragment can be synthesized withselenium-methionine (Se-Met) in place of methionine, and the Se-Metmultiwavelength anomalous dispersion data [Hendrickson, Science,254:51-58 (1991)] can be collected on CHESS F2, using reverse-beamgeometry to record Friedel pairs at four X-ray wavelengths,corresponding to two remote points above and below the Se absorptionedge (λ₁ and λ₄) and the absorption edge inflection point (λ₂) and peak(λ₃). Selenium sites can be located using SHELXS-90 in Patterson searchmode (G. M. Sheldrick). Experimental phases (α_(MAD)) can be estimatedvia a multiple isomorphous replacement/anomalous scattering strategyusing MLPHARE (Z. Otwinowski, Southwestern University of Texas, Dallas)with three of the wavelengths treated as derivatives and one (λ₂)treated as the parent for example. In either case, data can be processedusing HKL, DENZO and SCALEPACK (Z. Otwinowski and W. Minor).

[0104] In addition, X-PLOR, as used in the Example below [Bruger, X-PLORv. 3.1 Manual, New Haven: Yale University, (1993B)] or Heavy [T.Terwilliger, Los Alamos National Laboratory] may be utilized for bulksolvent correction and B-factor scaling. After density modification andnon-crystallographic averaging, the protein is built into a electrondensity map using the program 0, as exemplified below [Jones et al.,Acta Cryst., A47: 110-119 (1991)]. Model building interspersed withpositional and simulated annealing refinement [Brünger, 1993B, supra]can permit the an unambiguous trace and sequence assignment of theN-terminal domain fragment of the STAT protein.

Protein-Structure Based Design of Agonists and Antagonists of STATProteins

[0105] Once the three-dimensional structure of a crystal comprising a anN-terminal domain fragment of a STAT protein is determined, a potentialligand (antagonist or agonist) is examined through the use of computermodeling using a docking program such as GRAM, DOCK, or AUTODOCK[Dunbrack et al., 1997, supra]. This procedure can include computerfitting of potential ligands to the STAT dimer to ascertain how well theshape and the chemical structure of the potential ligand will complementor interfere with the dimer-dimer interaction. [Bugg et al., ScientificAmerican, December:92-98 (1993); West et al., TIPS, 16:67-74 (1995)].Computer programs can also be employed to estimate the attraction,repulsion, and steric hindrance of the ligand to the dimer-dimer bindingsite. Generally the tighter the fit (e.g., the lower the sterichindrance, and/or the greater the attractive force) the more potent thepotential drug will be since these properties are consistent with atighter binding constant. Furthermore, the more specificity in thedesign of a potential drug the more likely that the drug will notinterfer with other properties of the STAT protein or other proteins(particularly proteins present in the nucleus). This will minimizepotential side-effects due to unwanted interactions with other proteins.

[0106] Initially a potential ligand could be obtained by screening arandom peptide library produced by recombinant bacteriophage forexample, [Scott and Smith, Science, 249:386-390 (1990); Cwirla et al.,Proc. Natl. Acad. Sci., 87:6378-6382 (1990); Devlin et al., Science,249:404-406 (1990)] or a chemical library. A ligand selected in thismanner could be then be systematically modified by computer modelingprograms until one or more promising potential ligands are identified.Such analysis has been shown to be effective in the development of HIVprotease inhibitors [Lam et al., Science 263:380-384 (1994); Wlodawer etal., Ann. Rev. Biochem. 62:543-585 (1993); Appelt, Perspectives in DrugDiscovery and Design 1:23-48 (1993); Erickson, Perspectives in DrugDiscovery and Design 1: 109-128 (1993)].

[0107] Such computer modeling allows the selection of a finite number ofrational chemical modifications, as opposed to the countless number ofessentially random chemical modifications that could be made, and ofwhich any one might lead to a useful drug. Each chemical modificationrequires additional chemical steps, which while being reasonable for thesynthesis of a finite number of compounds, quickly becomes overwhelmingif all possible modifications needed to be synthesized. Thus through theuse of the three-dimensional structure disclosed herein and computermodeling, a large number of these compounds can be rapidly screened onthe computer monitor screen, and a few likely candidates can bedetermined without the laborious synthesis of untold numbers ofcompounds.

[0108] Once a potential ligand (agonist or antagonist) is identified itcan be either selected from a library of chemicals as are commerciallyavailable from most large chemical companies including Merck, GlaxoWelcome, Bristol Meyers Squib, Monsanto/Searle, Eli Lilly, Novartis andPharmacia UpJohn, or alternatively the potential ligand may besynthesized de novo. As mentioned above, the de novo synthesis of one oreven a relatively small group of specific compounds is reasonable in theart of drug design. The prospective drug can be placed into any standardbinding assay described below to test its effect on the dimericN-terminal domain STAT-STAT interaction.

[0109] When a suitable drug is identified, a supplemental crystal can begrown which comprises a protein-ligand complex formed between anN-terminal domain of the STAT protein and the drug. Preferably thecrystal effectively diffracts X-rays allowing the determination of theatomic coordinates of the protein-ligand complex to a resolution ofgreater than 5.0 Angstroms, more preferably greater than 3.0 Angstroms,and even more preferably greater than 2.0 Angstroms. Thethree-dimensional structure of the supplemental crystal can bedetermined by Molecular Replacement Analysis. Molecular replacementinvolves using a known three-dimensional structure as a search model todetermine the structure of a closely related molecule or protein-ligandcomplex in a new crystal form. The measured X-ray diffraction propertiesof the new crystal are compared with the search model structure tocompute the position and orientation of the protein in the new crystal.Computer programs that can be used include: X-PLOR and AMORE [J. Navaza,Acta Crystallographics ASO, 157-163 (1994)]. Once the position andorientation are known an electron density map can be calculated usingthe search model to provide X-ray phases. Thereafter, the electrondensity is inspected for structural differences and the search model ismodified to conform to the new structure. Using this approach, it willbe possible to use the claimed structure of the N-terminal domain STATfragment to solve the three-dimensional structures of any such STATfragment. Other computer programs that can be used to solve thestructures of such STAT crystals include QUANTA, CHARMM; INSIGHT; SYBYL;MACROMODE; and ICM.

[0110] For all of the drug screening assays described herein furtherrefinements to the structure of the drug will generally be necessary andcan be made by the successive iterations of any and/or all of the stepsprovided by the particular drug screening assay.

[0111] Phage Libraries for Drug Screening.

[0112] Phage libraries have been constructed which when infected intohost E. coli produce random peptide sequences of approximately 10 to 15amino acids [Parmley and Smith, Gene 73:305-318 (1988), Scott and Smith,Science 249:386-249 (1990)]. Specifically, the phage library can bemixed in low dilutions with permissive E. coli in low melting point LBagar which is then poured on top of LB agar plates. After incubating theplates at 37° C. for a period of time, small clear plaques in a lawn ofE. coli will form which represents active phage growth and lysis of theE. coli. A representative of these phages can be absorbed to nylonfilters by placing dry filters onto the agar plates. The filters can bemarked for orientation, removed, and placed in washing solutions toblock any remaining absorbent sites. The filters can then be placed in asolution containing, for example, a radioactive N-terminal peptidefragment of a STAT protein (e.g., a fragment having an amino acidsequence comprising SEQ ID NO: 1). After a specified incubation period,the filters can be thoroughly washed and developed for autoradiography.Plagues containing the phage that bind to the radioactive N-terminalpeptide fragment of a STAT protein can then be identified. These phagescan be further cloned and then retested for their ability to bind to theN-terminal peptide fragment of a STAT protein as before. Once the phageshave been purified, the binding sequence contained within the phage canbe determined by standard DNA sequencing techniques. Once the DNAsequence is known, synthetic peptides can be generated which representsthese sequences.

[0113] These peptides can be tested, for example, for their ability to:(1) interfere with a dimeric STAT protein binding to a weak STAT DNAbinding site; and (2) interfere with a dimeric STAT protein lacking theα4-tryptophan and/or a truncated STAT protein in which the N-terminaldomain has been removed (e.g., Statltc, [Vinkemeier et al., EMBO J. 15:5616 (1996)]) from binding to the same DNA binding site. If the peptideinterferes in the first case but does not interfere in the latter case,it may be concluded that the peptide interferes with dimeric N-terminalSTAT-STAT interaction.

[0114] The effective peptide(s) can be synthesized in large quantitiesfor use in in vivo models and eventually in humans to modulate STATsignal transduction. It should be emphasized that synthetic peptideproduction is relatively non-labor intensive, easily manufactured,quality controlled and thus, large quantities of the desired product canbe produced quite cheaply. Similar combinations of mass producedsynthetic peptides have recently been used with great success[Patarroyo, Vaccine 10:175-178 (1990)].

Binding Assays for Drug Screening Assays

[0115] The drug screening assays of the present invention may use any ofa number of assays for measuring the stability of a STAT-STAT dimericinteraction, including N-terminal dimeric STAT fragments and/or adimeric STAT-STAT-DNA binding interaction. In one embodiment thestability of a preformed DNA-protein complex between a dimeric STATprotein and its corresponding DNA binding site is examined as follows:the formation of a complex between the STAT protein and a labeledoligonucleotide is allowed to occur and unlabelled oligonucleotides areadded in vast molar excess after the reaction reaches equilibrium. Atvarious times after the addition of unlabelled competitor DNA, aliquotsare layered on a running native polyacrylamide gel to determine free andbound oligonucleotides. In one preferred embodiment the protein isSTAT1α, and two different labeled DNAs are used, the natural cfos site,an example of a “weak” site, and the mutated cfos-promotor element, theM67 site [Wagner et al., EMBO J 9:4477 (1990)] an example of a “strong”site as described below. Other examples of weak sites include those inthe promoter of the MIG gene, and those in the regulatory region of theinterferon-γ gene. Other examples of strong sites include those such asthe selected optimum site, S1 [Horvath et al., Genes & Devel. 9:984(1995)] or the promoter of the IRF-1 gene.

[0116] In a related binding assay, a nucleic acid containing a weak STATbinding site is placed on or coated onto a solid support. Methods forplacing the nucleic acid on the solid support are well known in the artand include such things as linking biotin to the nucleic acid andlinking avidin to the solid support. Dimeric STAT proteins are allowedto equilibrate with the nucleic acid and drugs are tested to see if theydisrupt or enhance the binding. Disruption leads to either a fasterrelease of the STAT protein which may be expressed as a faster off time,and or a greater concentration of released STAT dimer. Enhancement leadsto either a slower release of the STAT protein which may be expressed asa slower off time, and/or a lower concentration of released STATprotein.

[0117] The STAT protein may be labeled as described below. For example,in one embodiment radiolabeled STAT proteins are used to measure theeffect of a drug on binding. In another embodiment the naturalultraviolet absorbance of the STAT protein is used. In yet anotherembodiment, a Biocore chip (Pharmacia) coated with the nucleic acid isused and the change in surface conductivity can be measured.

[0118] In yet another embodiment, the affect of a prospective drug (atest compound) on interactions between N-terminal domains of STATs isassayed in living cells that contain or can be induced to containactivated STAT proteins, i.e., STAT protein dimers. Cells containing areporter gene, such as the heterologous gene for luciferase, greenfluorescent protein, chloramphenicol acetyl transferase orβ-galactosidase, operably linked to a promoter comprising two weak STATbinding sites are contacted with a prospective drug in the presence of acytokine which activates the STAT(s) of interest. The amount (and/oractivity) of reporter produced in the absence and presence ofprospective drug is determined and compared. Prospective drugs whichreduce the amount (and/or activity) of reporter produced are candidateantagonists of the N-terminal interaction, whereas prospective drugswhich increase the amount (and/or activity) of reporter produced arecandidate agonists. Cells containing a reporter gene operably linked toa promoter comprising strong STAT binding sites are then contacted withthese candidate drugs, in the presence of a cytokine which activates theSTAT(s) of interest. The amount (and/or activity) of reporter producedin the presence and absence of candidate drugs is determined andcompared. Drugs which disrupt interactions between dimeric N-terminaldomains of the STATs will not reduce reporter activity in this secondstep. Similarly, candidate drugs which enhance interactions betweendimeric N-terminal domains of STATs will not increase reporter activityin this second step.

[0119] In an analogous embodiment, two reporter genes each operablyunder the control of one or the other of the two types promotersdescribed above can be comprised in a single host cell as long as theexpression of the two reporter gene products can be distinguished. Forexample, different modified forms of green fluorescent protein can beused as described in U.S. Pat. No. 5,625,048, Issued Apr. 29, 1997,hereby incorporated by reference in its entirety. Although cells thatnaturally encode the STAT proteins may be used, preferably a cell isused that is transfected with a plasmid encoding the STAT protein. Forexample transient transfections can be performed with 50% confluent U3Acells using the calcium phosphate method as instructed by themanufacturer (Stratagene). In addition as mentioned above, the cells canalso be modified to contain one or more reporter genes, a heterologousgene encoding a reporter such as luciferase, green fluorescent proteinor derivative thereof, chloramphenicol acetyl transferase,13-galactosidase, etc. Such reporter genes can individually be operablylinked to promoters comprising two weak STAT binding sites and/or apromoter comprising a strong STAT binding site. Assays for detecting thereporter gene products are readily available in the literature. Forexample, luciferase assays can be performed according to themanufacturer's protocol (Promega), and p-galactosidase assays can beperformed as described by Ausubel et al., [in Current Protocols inMolecular Biology, J. Wiley & Sons, Inc. (1994)].

[0120] In one example, the transfection reaction can comprise thetransfection of a cell with a plasmid modified to contain a STATprotein, such as a pcDNA3 plasmid (Invitrogen), a reporter plasmid thatcontains a first reporter gene, and a reporter plasmid that contains asecond reporter gene. Although the preparation of such plasmids is nowroutine in the art, many appropriate plasmids are commercially availablee.g., a plasmid with β-galactosidase is available from Stratagene.

[0121] The reporter plasmids can contain specific restriction sites inwhich an enhancer element having a strong STAT binding site oralternatively two tandemly arranged “weak” STAT binding sites can beinserted. In one particular embodiment, thirty-six hours aftertransfection of the cells with a plasmid encoding STAT-1, the cells aretreated with 5 ng/ml interferon-γ Amgen for ten hours. Proteinexpression and tyrosine phosphorylation (to monitor STAT activation) canbe determined by e.g., gel shift experiments with whole cell extracts.

Labels

[0122] Suitable labels include enzymes, fluorophores (e.g., fluoresceinisothiocyanate (FITC), phycoerythrin (PE), Texas red (TR), rhodamine,free or chelated lanthanide series salts, especially Eu³⁺, to name a fewfluorophores), chromophores, radioisotopes, chelating agents, dyes,colloidal gold, latex particles, ligands (e.g., biotin), andchemiluminescent agents.

[0123] When a control marker is employed, the same or different labelsmay be used for the test and control marker gene.

[0124] In the instance where a radioactive label, such as the isotopes³H, ¹⁴C, ³²P, 35S, ³⁶Cl, ⁵¹Cr, ⁵⁷Co, ⁵⁸Co, ⁵⁹Fe, ⁹⁰Y, ¹²⁵I, ¹³¹I, and¹⁸⁶Re are used, known currently available counting procedures may beutilized. In the instance where the label is an enzyme, detection may beaccomplished by any of the presently utilized calorimetric,spectrophotometric, fluorospectrophotometric, amperometric or gasometrictechniques known in the art.

[0125] Direct labels are one example of labels which can be usedaccording to the present invention. A direct label has been defined asan entity, which in its natural state, is readily visible, either to thenaked eye, or with the aid of an optical filter and/or appliedstimulation, e.g. U.V. light to promote-fluorescence. Among examples ofcolored labels, which can be used according to the present invention,include metallic sol particles, for example, gold sol particles such asthose described by Leuvering (U.S. Pat. No. 4,313,734); dye soleparticles such as described by Gribnau et al. (U.S. Pat. No. 4,373,932and May et al. (WO 88/08534); dyed latex such as described by May,supra, Snyder (EP-A 0 280 559 and 0 281 327); or dyes encapsulated inliposomes as described by Campbell et al. (U.S. Pat. No. 4,703,017).Other direct labels include a radionucleotide, a fluorescent moiety or aluminescent moiety. In addition to these direct labeling devices,indirect labels comprising enzymes can also be used according to thepresent invention. Various types of enzyme linked immunoassays are wellknown in the art, for example, alkaline phosphatase and horseradishperoxidase, lysozyme, glucose-6-phosphate dehydrogenase, lactatedehydrogenase, urease, these and others have been discussed in detail byEva Engvall in Enzyme Immunoassay ELISA and EMIT in Methods inEnzymology, 70:419-439 (1980) and in U.S. Pat. No. 4,857,453.

[0126] Suitable enzymes include, but are not limited to, alkalinephosphatase, β-galactosidase, green fluorescent protein and itsderivatives, luciferase, and horseradish peroxidase.

[0127] Other labels for use in the invention include magnetic beads ormagnetic resonance imaging labels.

[0128] The present invention may be better understood by reference tothe following non-limiting Example, which is provided as exemplary ofthe invention. The following example is presented in order to more fullyillustrate the preferred embodiments of the invention. It should in noway be construed, however, as limiting the broad scope of the invention.

EXAMPLE Structure of the Amino-Terminal Protein Interaction Domain ofSTAT-4 Introduction

[0129] STATs are a family of transcription factors that are specificallytriggered to participate in gene activation when cells encountercytokines and growth factors. The crystal structure of a conserveddomain comprising the first 123 residues of STAT-4 has been determinedat 1.45 A. This domain (N-terminal domain) has been implicated inseveral protein:protein interactions affecting transcription. TheN-terminal domain enables STAT dimers to interact and to bind DNAcooperatively, a mechanism important for gene activation and bindingsite discrimination. The domain consists of 8 helices that are assembledinto a hook-like structure. The crystal structure shows the formation ofdimers formed by polar interactions across one face of the hook,revealing the nature of the cooperative interactions between STATdimers. Mutagenesis of an invariant Trp residue that is at the heart ofthis interface abolishes cooperative dimer:dimer DNA binding by the fulllength protein in vitro and reduces transcriptional response aftercytokine stimulation in vivo.

Materials and Methods

[0130] Expression and Purification of the STAT-4 N-terminal Domain:

[0131] The STAT-4 N-terminal domain was expressed as a C-terminal fusionwith glutathione S-transferase (GST). The expression vector wasconstructed by PCR amplification (Vent DNA-polymerase, New EnglandBiolabs) of the appropriate region of mouse STAT-4 cDNA and cloning ofthe fragment into the BamHI and EcoRI sites of pGEX2T (Pharmacia).Sequence comparison with the STAT-1 amino-terminal domain led to thedecision to terminate the homologous domain in STAT-4 after residue 124.To facilitate cleavage of the GST tag from the recombinant STAT-4, threeglycine residues were included after the thrombin cleavage site [Guanand Dixon, Anal. Biochem. 192:262 (1991)]. The resulting protein hasfour additional amino-terminal amino acids and Met 1 of STAT-4 isreplaced with Gly. The construct was verified by dideoxy sequencing.BL21p(lysS) cells were grown and lysed as described [Vinkemeier et al.,EMBO J. 15:5616 (1996)]. Soluble protein was incubated with 0.2 Vol. ofa 50% (vol./vol.) slurry of glutathione agarose (Pharmacia). The boundprotein was washed with cleavage buffer (50 mM Hepes/HCl pH 8.0, 150 mMKCl, 2.5 mM CaCl₂, 5 mM DTT). Cleavage with thrombin (˜1U/mg protein;Novagen) was performed overnight at room temperature with the proteinbound to the beads. The eluted STAT-4 N-terminal domain fragment (incleavage buffer) was concentrated to 20 mg/ml with centricon (Amicon)and used for crystallization.

[0132] Crystallization and Data Collection and Processing:

[0133] Initial studies were performed with the STAT- 1 N-terminaldomain, but led only to small, needle-like crystals. Hanging drops ofone μl of the STAT-4 N-terminal domain fragment was mixed with equalvolumes of reservoir buffer containing 0.2 M Na⁺CH₃COO⁻, 0.1 M Tris/HCLpH 8.0, 17% PEG4000. Hexagonal crystals (0.2×0.2×0.2 mm) were routinelygrown overnight at 20° C. The crystals contain one molecule of theSTAT-4 N-terminal domain in the asymmetric unit, and are in the spacegroup P6₅22 (a=79.51 Å, b=79.51 Å, c=84.68 Å). Crystals werecryo-protected in reservoir solution enriched in PEG to 20% and glycerolto 22.5% prior to flash freezing. Heavy atom derivatives were preparedby soaking crystals for 30 minutes in a saturated solution ofparahydroxy-mercunbenzoic acid diluted 1:20 with the cryoprotectivesolution. Data for the native crystal were collected at the BrookhavenNational Laboratory at beam line X25 using a MAR imaging plate detectorsystem. A MAD experiment on a parahydroxy-mercuribenzoic acidderivitized crystal was performed at the Brookhaven National Laboratoryon beam line X4A using Fuji imaging plates. Data processing andreduction were carried out with HKL, DENZO, and SCALEPACK (written by Z.Otwinowski and W. Minor).

[0134] Model Building and Refinement.

[0135] Model building was performed using 0 [Jones et al., ActaCrystallogr. A47: 110-119 (1991)]. Bulk solvent correction andanisotropic B-factor scaling was applied during refinement using X-PLOR[Brünger, X-PLOR (Version 3.1) Manual, The Howard Hughes MedicalInstitute and Department of Molecular Biophysics and Biochemistry, YaleUniversity, 260 Whitney Avenue, New Haven, Conn. 06511, (1992)]. Thefinal refinement statistics are shown in Table 1. Of the fiveheterologous residues at the amino-terminus the first three residues(GlySerGly) as well as the C-terminal residue Gln 124 are not visible inthe electron density map. No amino acids occupy disallowed regions ofthe Ramachandran plot and 95% fall into the most favored region.

[0136] Preparation, Expression, and Purification of the Mutated STAT-1α:

[0137] Mutated STAT-1α (Trp 37 to Ala) was expressed from pAcSG2 inbaculovirus infected insect cells. PCR was used to exchange codon 37[TGG] with [GCA] in the NcoI/SpeI fragment of human STAT-1 cDNA.Additionally, a 6-His tag was added to the C-terminus. Modificationswere confirmed by sequencing. Insect cells were lysed (douncehomogenizer), mutated STAT-1α purified under native conditions on Ni²⁺nitrilotriacetic acid (Qiagen) and eluted with 200 mM imidazole in 20 mMTris/HCl, pH 8.0, 10 mM MgCl₂, 50 mM KCl, 5 mM DTT.

[0138] Preparation of EGF-Receptor Kinase and in vitro Phosphorylationof STAT Proteins.

[0139] Tyrosine phosphorylated wild type human STAT-1α was produced asdescribed [Vinkemeier et al., EMBO J. 15:5616 (1996)]. In vitrophosphorylation was done as described [Vinkemeier et al., EMBO J.15:5616 (1996)]. Human carcinoma A431 cells were grown to 90% confluencyin 150 mm diameter plates in Dulbecco's modified Eagle's mediumsupplemented with 10% bovine calf serum (Hyclone). Cells were washedonce with chilled PBS and lysates were prepared in 1 ml ice cold lysisbuffer (10mM Hepes/HCl, 150 mM NaCl, 0.5% Triton X-100, 10% Glycerol, 1mM Na₃VO₄, 10 mM EDTA, Complete™ protease inhibitors, pH 7.5). After 10min on ice, the cells were scraped, vortexed and dounce homogenized (5strokes). The lysates were cleared by centrifugation at 4° C. for 20 minat top speed in an Eppendorf microfuge and stored at −70° C. untilneeded. Immediately before use 1 volume of the lysate was diluted with 4volumes of the lysis buffer (“diluted lysate”). EGF-receptorprecipitates were obtained by incubating 5 ml of diluted lysate with 50μg of an anti-EGF-receptor monoclonal antibody directed against theextracellular domain. After 2 hours of rotating the sample at 4° C., 750μl of Protein-A-agarose (50% slurry; Oncogene Science) was added, andthe incubation was allowed to proceed, while rotating, for another 1hour. Agarose beads containing the EGF-receptor immunoprecipitates werethen washed 5 times with lysis buffer and finally twice with storagebuffer (20% Glycerol, 20 mM Hepes/HCl, 100 mM NaCl, 0.1 mM Na₃VO₄).Precipitates from 5 ml diluted lysate were dissolved in 0.5 ml storagebuffer, flash frozen on dry ice and stored at −70° C. Immediately beforean in vitro kinase reaction the Protein-A-agarose bound EGF-receptorfrom 5 ml dilute lysate was washed once with lx kinase buffer (20 mMTris/HCl, 50 mM KCl, 0.3 mM Na₃VO₄, 2 mM DTT, pH 8.0) and then dissolvedin 0.4 ml (total volume) of this buffer. Afterwards the washedEGF-receptor precipitate was incubated on ice for 10 min in the presenceof a final concentration of mouse EGF of 0.15 ng/μl. Phosphorylationreactions were carried out in Eppendorf tubes in a final volume of 1 ml.To the pre-incubated kinase preparation the following was added: 60 μl10× kinase buffer, 20 μl 0.1 M DTT, 50 μl 0.1 M ATP, 4 mg STAT protein(Superdex 200 eluate for STAT1α and STAT1 β; ammonium sulfate pelletsdissolved in [20 mM Tris /HCl, pH 8.0] for STAT1tc), 10 μl 1M MnCl₂ anddH2O to 1 ml. The reaction was allowed to proceed for 15 hours at 4° C.After 3 hours an additional 15 μl of 0.1 M ATP was added.

[0140] Gel Shift Experiments and Determination of Tetramer Stability:

[0141] Gel shift experiments and determination of tetramer stabilitywere done as described [Vinkemeier et al., EMBO J. 15:5616 (1996)] withan oligonucleotide containing two copies of the STAT recognition elementfrom the c-fos gene [Wagner et al., EMBO J. 9:4477 (1990)] spaced by 10base pairs (5′-GCCAGTCAGTTCCCGTCAATGCATCAGGTTCCCGTCAATGCAT-3′). Bothprotein preparations (Tyr-phosphorylated wild type STAT-1α and W37Amutant) were titrated in gel shift experiments with an oligonucleotidecontaining a single M67 site (5′-GCCGATTTCCCGTAAATCAT-3′) [Wagner etal., EMBO J. 9:4477 (1990)] to assure similar loading of active protein.A 12.5 μl reaction volume contained DNA binding buffer (20 mM Hepes/HCl,4% Ficoll, 40 mM KCl, 10 mM MgCl₂, 10 mM CaCl₂, 1 mM DTT) radiolabelledDNA at a final concentration of 1×10⁻¹⁰ M unless stated otherwise, 50 ngdIdC, 0.2 mg/ml BSA (Boehringer Mannheim), and the indicated amount ofpurified phosphorylated STAT protein. The reaction volume was mixed andthen incubated at room temperature. The time necessary to reachequilibrium was assessed by EMSA [(Stone et al., in Jost, J. P. & Saluz,H. P. (eds.) A Laboratory Guide to In Vitro Studies of Protein-DNAInteractions BioMethods, vol.5:163-195 (1991)]. For all DNA fragmentstested, equilibrium turned out to be fully established at the earliesttimepoint that can be determined by this technique (30 sec). Thereforeincubation periods of 5-15 minutes were chosen. Reaction products wereloaded onto a 4% polyacrylamide gel (1.5 mm thick) containing 0.25×Tris-borate-EDTA which had been pre-run at 20V/cm for 2 hours at 4° C.Electrophoresis was continued for 60 minutes at 4° C. Gels were driedand exposed to X-ray film and quantitated by a Molecular DynamicsPhospholmager.

[0142] Transient Transfections:

[0143] Transient transfections were performed on six well plates with50% confluent U3A cells using the calcium phosphate method as instructedby the manufacturer (Stratagene) with the following modifications.Transfection reactions contained per well 4.5 μg of either wild typeSTAT-1α or the W37A mutant in plasmid pcDNA3 (Invitrogen), 4 μgluciferase reporter plasmid pLuc, and 0.4 μg β-galactosidase reporterplasmid (Stratagene).

[0144] The luciferase reporter contained in its BamH1 site as anenhancer element two tandemly arranged “weak” STAT-1 binding sites(5′-GATCAGTTCCCGTCAATCATGATCCAGTTCCCGTCAATGATCCCCGGGATC-3′, bindingsites underlined) from the human c-fos promoter. 36 hrs. aftertransfection, cells were treated with 5 ng/ml interferon-γ (Amgen) for10 hrs. or left untreated. Luciferase assays were performed according tothe manufacturer's protocol (Promega) and β-galactosidase assays weredone as previously described [Ausubel et al., in Current Protocols inMolecular Biology, J. Wiley & Sons, Inc. (1994)]. Protein expression andTyr-phosphorylation were checked in gel shift experiments with wholecell extracts for both wild type and mutant protein and were comparable.All results shown are luciferase activities normalized against theinternal control β-galactosidase activity.

Results:

[0145] Proteolytic digestion of purified STAT-1 showed that theN-terminal 131 residues form a stable domain that is readily cleaved offthe intact molecule, indicating that it is an independently foldedmodule [Vinkemeier et al., EMBO J. 15:5616 (1996)]. Sequence alignmentsshow that the N-terminal domain is highly conserved in the STAT familyof proteins (FIG. 1). The average sequence identity for this regionbetween mammalian STAT proteins is 40%, and ranges from 51% betweenSTAT-1 and STAT-4 to 20% between STAT-5 and STAT-6. Over theapproximately 750 amino acids that span the length of the common core ofthe STATs only the SH2 domain is more highly conserved [Schindler andDarnell, Annu. Rev. Biochem. 64:621 (1995)]. The N-terminal domain isalso found in the Drosophila STAT (dSTAT92E) [Hou et al., Cell 84:411(1996); Yan et al., Cell 84:421 (1996)], (FIG. 1) and in the recentlydiscovered STAT in Dictyostelium discoideum [Kawata et al., Cell 89:909(1997)]. Interestingly, the first gene defect established in the DSTAT92gene is a misspliced variant that produces both normal mRNA and an mRNAencoding only the N-terminal 41 residues. Expression of this fragmenthas a partial dominant negative effect on transcriptional activation bythe wild type protein in cell culture. In the fly it is associated witha weak abnormal phenotype [Yan et al., Proc. Natl. Acad. Sci. USA93:5842 (1996)].

[0146] As disclosed herein the crystal structure of the N-terminaldomain of STAT-4 has been determined by multi-wavelength anomalousdiffraction and refined at a resolution of 1.45 Å (R=19.4%,R_(free)=22.3%) (Table 1). The STAT-4 N-domain is all helical, with anunusual architecture. Instead of the up-down connectivity of helixbundles or the box-like helical packing of the globin fold, the N-domainis constructed from three distinct structural elements that packtogether. The N-terminal 40 residues encompass the first 4 helices(α1-α4), which form a ring-shaped element (colored red in FIGS. 2 and3B). A small helix (α5) connects this ring to the next structuralelement, an anti-parallel coiled-coil formed by helices α6 and α7. Theheptad repeat of hydrophobic amino acids, characteristic ofcoiled-coils, is conserved across the STATs (FIG. 1). Finally, thedistal surface of the ring-shaped element forms a docking site for thelast helix in the structure (α8). The overall appearance of thestructure is that of a triangular hook, with the inner surface of thehook being formed by the intersection of the proximal surfaces of thering-shaped element and the coiled-coil.

[0147] The N-domain of STAT-4 is dimeric in solution, and a two-foldsymmetry axis in the crystal generates a dimer with an extensive polarinterface that involves one face of the hook. The N-domain has a welldefined hydrophobic core that is conserved across the STATs, consistentwith a stable and defined fold (FIGS. 1 and 2). However, a notablefeature of the N-terminal ring-shaped element is that it is stabilizedby polar interactions involving buried charges. The ring is closed offby a helix-dipole interaction between the N-terminal region of helix α1and the carboxylate group of Glu 39, presented by the C-terminal regionof helix α4 (FIG. 2B). Glu 39 forms a hydrogen bond with the amidenitrogen of the Gln 3 residue, and is oriented correctly for this chargedipole interaction by the side chain of Arg 31, which in turn forms aburied ion pair with Glu 112. Glu 112 is positioned by interactions withTyr 22 and a buried water molecule. Each of the side chains involved isinvariant in all STATs (FIG. 1), indicating that the ring-shaped elementis conserved in the STAT architecture. TABLE 1 Summary of theCrystallographic Analysis Resolu- Reflections Complete- tion total/ nessR_(sym)* Sites (Å) unique (%) (%) I_(n)/σ† (N) FOM# Native Data 1.45237647/ 96.4 7.2 19.6 27594 (91.7) (19.2) (4.2) MAD Analysis λ1 =1.00842Å 2.15 112445/ 98.6 7.7 24.7 2 8854 (95.7) (21.2) (6.1) λ2 =1.01337Å 2.51 31403/ 100.0 7.3 27.2 2 8992 (100.0) (21.1) (8.1) λ3 =1.00932Å 2.00 126292/ 96.6 8.4 22.1 2 10731 (95.7) (33.4) (4.5) 0.542Completeness R-factor R-free Refinement Cut Off Reflections (%) (%)¶(%)‡ 10-1 45Å |F|/σ(|F|)>2 25285 88.8 18.8 21.6 10-1 45Å all data 2656093.3 19.4 22.3 r m.s deviation of model bond lengths (Å) 0.015 bondangles (degrees) 1.4 average B-factor (protein) (Å²) 9.6 water molecules165 average B-factor (water) (Å²) 25

[0148] TABLE II Interactions Between the Two N-terminal Domains of Stat4sidechain-sidechain contacts; direct Molecule A Å Molecule B Asn5 3.0Asn5 Trp37 3.2 Glu66 Thr40 3.2 Arg70 3.2 Gln67 Gln41 2.9 Glu66 Asp42 3.3Lys73 Gln63 3.6 Gln63 Glu66 3.2 Trp37 Gln67 3.2 Thr40 Arg70 3.0 Thr40Lys73 3.3 Asp42 sidechain-sidechain contacts; water mediated Mole- Mole-cule A Å Water A Å Water B Å Water C Å cule B Gln36 3.0 H₂O 984 2.9 H₂O269 2.7 Glu39 Trp37 3.1 H₂O 167 3.0 Gln63 Glu39 2.8 H₂O 186 3.4 Arg702.7 H₂O 269 2.8 Arg70 Thr40 2.8 H₂O 314 2.8 Gln67 2.8 H₂O 314 2.8 H₂O 62.8 H₂O 314 2.8 Thr40 2.8 Glu66 Gln41 2.8 H₂O 5212 2.9 Glu66 Asn59 2.9H₂O 212 2.9 Glu66 Gln63 2.7 H₂O 901 2.7 Gln63 3.0 H₂O 164 3.1 Trp37Glu66 2.7 H₂O 164 3.1 Trp37 2.8 H₂O 6 2.8 H₂O 314 2.8 Thr40 2.9 H₂O 2122.9 Asn59 2.8 Gln41 Gln67 3.2 H₂O 6 2.8 H₂O 314 2.8 Thr40 Arg70 2.8 H₂O314 2.8 Thr40 3.4 H₂O 186 2.8 Glu39 3.1 H₂O 269 2.7 Glu39sidechain-mainchain contacts; direct Molecule A Å Molecule B Gln36-N 2.9Gln36 Glu39-O 3.6 Arg70 Arg70 2.6 Glu39-O sidechain-mainchain contacts;water mediated Molecule A Å Water A Å Water B Å Molecule B His32-O 2.9H₂O 1024 3.1 Gln36 Gln36-N 3.2 H₂O 41 2.7 Gln36 Gln36 3.1 H₂O 1024 2.9His31-O Trp37-N 2.9 H₂O 41 2.7 Gln36 Glu66-O 2.7 H₂O 5107 2.8 H₂O 50782.7 Gln41

[0149] A consequence of the utilization of these polar groups in thering-shaped element is the formation of a compact and potentiallyspecific interaction surface. This structural element forms a relativelyflat molecular surface that packs at an angle against another surfacepresented by the coiled-coil formed by helices α6 and α7. The juxtapositioning of the surface of the ring-shaped element with that of thecoiled-coil results in a wedge-shaped groove. This groove is lined withhydrophobic residues, with polar residues at the center, and has thenatural appearance of a ligand binding site. A possible function forthis groove is suggested by the fact that replacement of Arg 31 or Glu39 in STAT-1 by Ala results in a molecule that is much slower todephosphorylate after interferon-γ induction when compared to wild typeprotein [Shuai et al., Mol. Cell. Biol. 16:4932 (1996)]. Mutation ofthese two residues is likely to drastically alter the conformation ofthe ring-shaped element (FIG. 2B), suggesting that a phosphatase thatcontrols STAT dephosphorylation might bind to the groove in theN-terminal domain.

[0150] There is one molecule in the asymmetric unit of the STAT-4crystal, and it is related to another by a two-fold symmetry axis (FIGS.2A and 3A). There is an extensive interface between the two monomers ofthe dimer, burying 1,714 A² of surface area (FIG. 3A). An extendedintermolecular hydrogen bonding network is formed at the interface,involving 15 amino acid side chains and 12 water molecules per monomer(FIG. 3B). In addition 5 backbone contacts are also observed in eachmonomer. 11 of the 15 residues at the interface make direct hydrogenbonding contacts to the other monomer. The water molecules at theinterface are very well defined in the electron density map, many ofthem having low temperature factors (<10 Å²) (FIG. 3C).

[0151] The interface is almost entirely polar, with no involvement ofhydrophobic side chains. In contrast to the leucine zipper, whereinhydrophobic residues are utilized to generate the intermolecularinterface by the formation of a coiled-coil across the dimer interface[Lupas, Trends in Biochemical Sciences 21:375 (1996)], the coiled-coilin the N-terminal domain is firmly anchored within one N-terminal domainand its role is to serve as an architectural support for thepresentation of a number of interacting side chains at the dimerinterface and at the potential interaction groove. While hydrophobicinteractions are associated with stabilization of folded proteinstructures and are often found at the core of tight interfaces, polarinteractions can provide both stability and specificity inprotein-protein interactions [Fersht et al., Nature 314:235 (1985); Xuet al., Journal of Molecular Biology 265:68 (1997)]. In contrast to theresidues that constitute the buried core of the N-terminal domain, whichare conserved across STATs, the majority of the residues at the dimerinterface are not conserved (FIGS. 1 and 3B). This variation couldprovide specificity in STAT dimer:dimer interactions on DNA. Only two ofthe residues at the interface are invariant in all STATs: Trp 37, acentral anchor residue at the interface, and Glu 39, which alsoparticipates in the formation of the ring-shaped element.

[0152] To test the physiological relevance of the dimer that is observedin the crystal structure it was determined if the mutation of thecritical Trp 37 residue at the interface would disrupt or reduceoligomerization in vitro and transcriptional activation in vivo. Theseexperiments were carried out using STAT-1, for which a DNA-proteininteraction assay had been worked out previously [Vinkemeier et al.,EMBO J. 15:5616 (1996)]. In addition, a cell line (U3 Å) lacking STAT-1is available [Muller et al., EMBO J. 12:4221 (1993)] allowing theintroduction of expression vectors encoding mutant STAT-1 molecules. Theclose similarity between the N-terminal domains of STAT-1 and STAT-4(51% amino acid identity) ensures that the structural informationderived from the STAT-4 crystal structure is an accurate representationof the STAT-1 architecture.

[0153] Conditions under which an oligonucleotide bearing tandem bindingsites binds full length STAT-1 as a tetramer have been established; thesite contains two “weak” binding sites, spaced 10 base pairs apart[Vinkemeier et al., EMBO J. 15:5616 (1996)]. Competition experimentsshow that the off time is long for wild type STAT-1 (greater than 15-30minutes), indicating the formation of a stable complex. In contrast, ifan oligonucleotide containing only one “weak” site is used instead, theoff time was less than 30 seconds [Vinkemeier et al., EMBO J. 15:5616(1996)]. The stabilization of STAT-1 on oligonucleotides containingtandem binding sites is not observed if the N-terminal domain is deleted[Vinkemeier et al., EMBO J. 15:5616 (1996)]. This assay was used to testthe DNA binding properties of STAT-1 in which Trp 37 was replaced by Ala(W37 Å). Trp 37 plays a central role at the dimer interface, andparticipates in both direct and water mediated interactions with theother monomer (FIG. 3C).

[0154] The W37A mutant protein binds to the DNA probe with tandem sites,but the interaction is completely displaced by the addition ofunlabelled oligonucleotides (FIG. 4A). In contrast, the wild-typeprotein is resistant to displacement for more than 15 minutes. The sametwo tandem “weak” binding sites were used to drive transcription from areporter gene in an interferon-dependent transcriptional assay. U3Acells were co-transfected with the reporter gene and with either wildtype or W37A mutant STAT-1. The rather weak transcriptional induction byinterferon γ (approximately 2-fold) was abolished by the mutation (FIG.4B). Activated and dimeric STAT proteins do not form detectabletetrameres in solution in the absence of DNA. It is not known whetherthis is a consequence of limited binding affinity between N-terminaldomains or whether the conformation of the STAT molecule in the absenceof DNA impedes further oligomerization. In any case, the presentation ofhighly polar and unique interaction surfaces by the N-terminal domainsof the STATs provides a ready means for generating very specificinteractions between adjacent STAT dimers on the DNA, since the hydrogenbonding constraints of the interacting groups places stereochemicalconstraints on potential partners. While each N-domain dimer is closed,the fact that each STAT dimer presents two N-domains for interactionmakes possible the generation of open ended STAT-STAT interactions thatare limited only by the nature and number of the adjacent DNA bindingsites.

[0155] The present invention is not to be limited in scope by thespecific embodiments describe herein. Indeed, various modifications ofthe invention in addition to those described herein will become apparentto those skilled in the art from the foregoing description and theaccompanying figures. Such modifications are intended to fall within thescope of the appended claims.

[0156] It is further to be understood that all base sizes or amino acidsizes, and all molecular weight or molecular mass values, given fornucleic acids or polypeptides are approximate, and are provided fordescription.

[0157] Various publications are cited herein, the disclosures of whichare incorporated by reference in their entireties. The citation of anyreference herein should not be construed as an admission that suchreference is available as “Prior Art” to the instant application.

1 13 13 amino acids amino acid single linear peptide NO 1 Arg Xaa XaaLeu Xaa Xaa Trp Xaa Glu Xaa Gln Xaa Trp 1 5 10 851 amino acids aminoacid single linear protein NO 2 Met Ala Gln Trp Glu Met Leu Gln Asn LeuAsp Ser Pro Phe Gln Asp 1 5 10 15 Gln Leu His Gln Leu Tyr Ser His SerLeu Leu Pro Val Asp Ile Arg 20 25 30 Gln Tyr Leu Ala Val Trp Ile Glu AspGln Asn Trp Gln Glu Ala Ala 35 40 45 Leu Gly Ser Asp Asp Ser Lys Ala ThrMet Leu Phe Phe His Phe Leu 50 55 60 Asp Gln Leu Asn Tyr Glu Cys Gly ArgCys Ser Gln Asp Pro Glu Ser 65 70 75 80 Leu Leu Leu Gln His Asn Leu ArgLys Phe Cys Arg Asp Ile Gln Pro 85 90 95 Phe Ser Gln Asp Pro Thr Gln LeuAla Glu Met Ile Phe Asn Leu Leu 100 105 110 Leu Glu Glu Lys Arg Ile LeuIle Gln Ala Gln Arg Ala Gln Leu Glu 115 120 125 Gln Gly Glu Pro Val LeuGlu Thr Pro Val Glu Ser Gln Gln His Glu 130 135 140 Ile Glu Ser Arg IleLeu Asp Leu Arg Ala Met Met Glu Lys Leu Val 145 150 155 160 Lys Ser IleSer Gln Leu Lys Asp Gln Gln Asp Val Phe Cys Phe Arg 165 170 175 Tyr LysIle Gln Ala Lys Gly Lys Thr Pro Ser Leu Asp Pro His Gln 180 185 190 ThrLys Glu Gln Lys Ile Leu Gln Glu Thr Leu Asn Glu Leu Asp Lys 195 200 205Arg Arg Lys Glu Val Leu Asp Ala Ser Lys Ala Leu Leu Gly Arg Leu 210 215220 Thr Thr Leu Ile Glu Leu Leu Leu Pro Lys Leu Glu Glu Trp Lys Ala 225230 235 240 Gln Gln Gln Lys Ala Cys Ile Arg Ala Pro Ile Asp His Gly LeuGlu 245 250 255 Gln Leu Glu Thr Trp Phe Thr Ala Gly Ala Lys Leu Leu PheHis Leu 260 265 270 Arg Gln Leu Leu Lys Glu Leu Lys Gly Leu Ser Cys LeuVal Ser Tyr 275 280 285 Gln Asp Asp Pro Leu Thr Lys Gly Val Asp Leu ArgAsn Ala Gln Val 290 295 300 Thr Glu Leu Leu Gln Arg Leu Leu His Arg AlaPhe Val Val Glu Thr 305 310 315 320 Gln Pro Cys Met Pro Gln Thr Pro HisArg Pro Leu Ile Leu Lys Thr 325 330 335 Gly Ser Lys Phe Thr Val Arg ThrArg Leu Leu Val Arg Leu Gln Glu 340 345 350 Gly Asn Glu Ser Leu Thr ValGlu Val Ser Ile Asp Arg Asn Pro Pro 355 360 365 Gln Leu Gln Gly Phe ArgLys Phe Asn Ile Leu Thr Ser Asn Gln Lys 370 375 380 Thr Leu Thr Pro GluLys Gly Gln Ser Gln Gly Leu Ile Trp Asp Phe 385 390 395 400 Gly Tyr LeuThr Leu Val Glu Gln Arg Ser Gly Gly Ser Gly Lys Gly 405 410 415 Ser AsnLys Gly Pro Leu Gly Val Thr Glu Glu Leu His Ile Ile Ser 420 425 430 PheThr Val Lys Tyr Thr Tyr Gln Gly Leu Lys Gln Glu Leu Lys Thr 435 440 445Asp Thr Leu Pro Val Val Ile Ile Ser Asn Met Asn Gln Leu Ser Ile 450 455460 Ala Trp Ala Ser Val Leu Trp Phe Asn Leu Leu Ser Pro Asn Leu Gln 465470 475 480 Asn Gln Gln Phe Phe Ser Asn Pro Pro Lys Ala Pro Trp Ser LeuLeu 485 490 495 Gly Pro Ala Leu Ser Trp Gln Phe Ser Ser Tyr Val Gly ArgGly Leu 500 505 510 Asn Ser Asp Gln Leu Ser Met Leu Arg Asn Lys Leu PheGly Gln Asn 515 520 525 Cys Arg Thr Glu Asp Pro Leu Leu Ser Trp Ala AspPhe Thr Lys Arg 530 535 540 Glu Ser Pro Pro Gly Lys Leu Pro Phe Trp ThrTrp Leu Asp Lys Ile 545 550 555 560 Leu Glu Leu Val His Asp His Leu LysAsp Leu Trp Asn Asp Gly Arg 565 570 575 Ile Met Gly Phe Val Ser Arg SerGln Glu Arg Arg Leu Leu Lys Lys 580 585 590 Thr Met Ser Gly Thr Phe LeuLeu Arg Phe Ser Glu Ser Ser Glu Gly 595 600 605 Gly Ile Thr Cys Ser TrpVal Glu His Gln Asp Asp Asp Lys Val Leu 610 615 620 Ile Tyr Ser Val GlnPro Tyr Thr Lys Glu Val Leu Gln Ser Leu Pro 625 630 635 640 Leu Thr GluIle Ile Arg His Tyr Gln Leu Leu Thr Glu Glu Asn Ile 645 650 655 Pro GluAsn Pro Leu Arg Phe Leu Tyr Pro Arg Ile Pro Arg Asp Glu 660 665 670 AlaPhe Gly Cys Tyr Tyr Gln Glu Lys Val Asn Leu Gln Glu Arg Arg 675 680 685Lys Tyr Leu Lys His Arg Leu Ile Val Val Ser Asn Arg Gln Val Asp 690 695700 Glu Leu Gln Gln Pro Leu Glu Leu Lys Pro Glu Pro Glu Leu Glu Ser 705710 715 720 Leu Glu Leu Glu Leu Gly Leu Val Pro Glu Pro Glu Leu Ser LeuAsp 725 730 735 Leu Glu Pro Leu Leu Lys Ala Gly Leu Asp Leu Gly Pro GluLeu Glu 740 745 750 Ser Val Leu Glu Ser Thr Leu Glu Pro Val Ile Glu ProThr Leu Cys 755 760 765 Met Val Ser Gln Thr Val Pro Glu Pro Asp Gln GlyPro Val Ser Gln 770 775 780 Pro Val Pro Glu Pro Asp Leu Pro Cys Asp LeuArg His Leu Asn Thr 785 790 795 800 Glu Pro Met Glu Ile Phe Arg Asn CysVal Lys Ile Glu Glu Ile Met 805 810 815 Pro Asn Gly Asp Pro Leu Leu AlaGly Gln Asn Thr Val Asp Glu Val 820 825 830 Tyr Val Ser Arg Pro Ser HisPhe Tyr Thr Asp Gly Pro Leu Met Pro 835 840 845 Ser Asp Phe 850 5 aminoacids amino acid single linear peptide NO 3 Gly Ser Gly Gly Gly 1 5 43base pairs nucleic acid single linear other nucleic acid /desc =“PRIMER” NO 4 GCCAGTCAGT TCCCGTCAAT GCATCAGGTT CCCGTCAATG CAT 43 20 basepairs nucleic acid single linear other nucleic acid /desc = “PRIMER” NO5 GCCGATTTCC CGTAAATCAT 20 52 base pairs nucleic acid single linearother nucleic acid /desc = “PRIMER” NO 6 GATCAGTTCC CGTCAATCNATGATCCAGTT CCCGTCAATG ATCCCCGGGA TC 52 749 amino acids amino acid singlelinear protein NO 7 Met Ser Gln Trp Phe Glu Leu Gln Gln Leu Asp Ser LysPhe Leu Glu 1 5 10 15 Gln Val His Gln Leu Tyr Asp Asp Ser Phe Pro MetGlu Ile Arg Gln 20 25 30 Tyr Leu Ala Gln Trp Leu Glu Lys Gln Asp Trp GluHis Ala Ala Tyr 35 40 45 Asp Val Ser Phe Ala Thr Ile Arg Phe His Asp LeuLeu Ser Gln Leu 50 55 60 Asp Asp Gln Tyr Ser Arg Phe Ser Leu Glu Asn AsnPhe Leu Leu Gln 65 70 75 80 His Asn Ile Arg Lys Ser Lys Arg Asn Leu GlnAsp Asn Phe Gln Glu 85 90 95 Asp Pro Val Gln Met Ser Met Ile Ile Tyr AsnCys Leu Lys Glu Glu 100 105 110 Arg Lys Ile Leu Glu Asn Ala Gln Arg PheAsn Gln Ala Gln Glu Gly 115 120 125 Asn Ile Gln Asn Thr Val Met Leu AspLys Gln Lys Glu Leu Asp Ser 130 135 140 Lys Val Arg Asn Val Lys Asp GlnVal Met Cys Ile Glu Gln Glu Ile 145 150 155 160 Lys Thr Leu Glu Glu LeuGln Asp Glu Tyr Asp Phe Lys Cys Lys Thr 165 170 175 Ser Gln Asn Arg GluGly Glu Ala Asn Gly Val Ala Lys Ser Asp Gln 180 185 190 Lys Gln Glu GlnLeu Leu Leu His Lys Met Phe Leu Met Leu Asp Asn 195 200 205 Lys Arg LysGlu Ile Ile His Lys Ile Arg Glu Leu Leu Asn Ser Ile 210 215 220 Glu LeuThr Gln Asn Thr Leu Ile Asn Asp Glu Leu Val Glu Trp Lys 225 230 235 240Arg Arg Gln Gln Ser Ala Cys Ile Gly Gly Pro Pro Asn Ala Cys Leu 245 250255 Asp Gln Leu Gln Thr Trp Phe Thr Ile Val Ala Glu Thr Leu Gln Gln 260265 270 Ile Arg Gln Gln Leu Lys Lys Leu Glu Glu Leu Glu Gln Lys Phe Thr275 280 285 Tyr Glu Pro Asp Pro Ile Thr Lys Asn Lys Gln Val Leu Ser AspArg 290 295 300 Thr Phe Leu Leu Phe Gln Gln Leu Ile Gln Ser Ser Phe ValVal Glu 305 310 315 320 Arg Gln Pro Cys Met Pro Thr His Pro Gln Arg ProLeu Val Leu Lys 325 330 335 Thr Gly Val Gln Phe Thr Val Lys Ser Arg LeuLeu Val Lys Leu Gln 340 345 350 Glu Ser Asn Leu Leu Thr Lys Val Lys CysHis Phe Asp Lys Asp Val 355 360 365 Asn Glu Lys Asn Thr Val Lys Gly PheArg Lys Phe Asn Ile Leu Gly 370 375 380 Thr His Thr Lys Val Met Asn MetGlu Glu Ser Thr Asn Gly Ser Leu 385 390 395 400 Ala Ala Glu Leu Arg HisLeu Gln Leu Lys Glu Gln Lys Asn Ala Gly 405 410 415 Asn Arg Thr Asn GluGly Pro Leu Ile Val Thr Glu Glu Leu His Ser 420 425 430 Leu Ser Phe GluThr Gln Leu Cys Gln Pro Gly Leu Val Ile Asp Leu 435 440 445 Glu Thr ThrSer Leu Pro Val Val Val Ile Ser Asn Val Ser Gln Leu 450 455 460 Pro SerGly Trp Ala Ser Ile Leu Trp Tyr Asn Met Leu Val Thr Glu 465 470 475 480Pro Arg Asn Leu Ser Phe Phe Leu Asn Pro Pro Cys Ala Trp Trp Ser 485 490495 Gln Leu Ser Glu Val Leu Ser Trp Gln Phe Ser Ser Val Thr Lys Arg 500505 510 Gly Leu Asn Ala Asp Gln Leu Ser Met Leu Gly Glu Lys Leu Leu Gly515 520 525 Pro Asn Ala Gly Pro Asp Gly Leu Ile Pro Trp Thr Arg Phe CysLys 530 535 540 Glu Asn Ile Asn Asp Lys Asn Phe Ser Phe Trp Pro Trp IleAsp Thr 545 550 555 560 Ile Leu Glu Leu Ile Lys Asn Asp Leu Leu Cys LeuTrp Asn Asp Gly 565 570 575 Cys Ile Met Gly Phe Ile Ser Lys Glu Arg GluArg Ala Leu Leu Lys 580 585 590 Asp Gln Gln Pro Gly Thr Phe Leu Leu ArgPhe Ser Glu Ser Ser Arg 595 600 605 Glu Gly Ala Ile Thr Phe Thr Trp ValGlu Arg Ser Gln Asn Gly Gly 610 615 620 Glu Pro Asp Phe His Ala Val GluPro Tyr Thr Lys Lys Glu Leu Ser 625 630 635 640 Ala Val Thr Phe Pro AspIle Ile Arg Asn Tyr Lys Val Met Ala Ala 645 650 655 Glu Asn Ile Pro GluAsn Pro Leu Lys Tyr Leu Tyr Pro Asn Ile Asp 660 665 670 Lys Asp His AlaPhe Gly Lys Tyr Tyr Ser Arg Pro Lys Glu Ala Pro 675 680 685 Glu Pro MetGlu Leu Asp Asp Pro Lys Arg Thr Gly Tyr Ile Lys Thr 690 695 700 Glu LeuIle Ser Val Ser Glu Val His Pro Ser Arg Leu Gln Thr Thr 705 710 715 720Asp Asn Leu Leu Pro Met Ser Pro Glu Glu Phe Asp Glu Met Ser Arg 725 730735 Ile Val Gly Pro Glu Phe Asp Ser Met Met Ser Thr Val 740 745 770amino acids amino acid single linear protein NO 8 Met Ala Gln Trp AsnGln Leu Gln Gln Leu Asp Thr Arg Tyr Leu Lys 1 5 10 15 Gln Leu His GlnLeu Tyr Ser Asp Thr Phe Pro Met Glu Leu Arg Gln 20 25 30 Phe Leu Ala ProTrp Ile Glu Ser Gln Asp Trp Ala Tyr Ala Ala Ser 35 40 45 Lys Glu Ser HisAla Thr Leu Val Phe His Asn Leu Leu Gly Glu Ile 50 55 60 Asp Gln Gln TyrSer Arg Phe Leu Gln Glu Ser Asn Val Leu Tyr Gln 65 70 75 80 His Asn LeuArg Arg Ile Lys Gln Phe Leu Gln Ser Arg Tyr Leu Glu 85 90 95 Lys Pro MetGlu Ile Ala Arg Ile Val Ala Arg Cys Leu Trp Glu Glu 100 105 110 Ser ArgLeu Leu Gln Thr Ala Ala Thr Ala Ala Gln Gln Gly Gly Gln 115 120 125 AlaAsn His Pro Thr Ala Ala Val Val Thr Glu Lys Gln Gln Met Leu 130 135 140Glu Gln His Leu Gln Asp Val Arg Lys Arg Val Gln Asp Leu Glu Gln 145 150155 160 Lys Met Lys Val Val Glu Asn Leu Gln Asp Asp Phe Asp Phe Asn Tyr165 170 175 Lys Thr Leu Lys Ser Gln Gly Asp Met Gln Asp Leu Asn Gly AsnAsn 180 185 190 Gln Ser Val Thr Arg Gln Lys Met Gln Gln Leu Glu Gln MetLeu Thr 195 200 205 Ala Leu Asp Gln Met Arg Arg Ser Ile Val Ser Glu LeuAla Gly Leu 210 215 220 Leu Ser Ala Met Glu Tyr Val Gln Lys Thr Leu ThrAsp Glu Glu Leu 225 230 235 240 Ala Asp Trp Lys Arg Arg Gln Gln Ile AlaCys Ile Gly Gly Pro Pro 245 250 255 Asn Ile Cys Leu Asp Arg Leu Glu AsnTrp Ile Thr Ser Leu Ala Glu 260 265 270 Ser Gln Leu Gln Thr Arg Gln GlnIle Lys Lys Leu Glu Glu Leu Gln 275 280 285 Gln Lys Val Ser Tyr Lys GlyAsp Pro Ile Val Gln His Arg Pro Met 290 295 300 Leu Glu Glu Arg Ile ValGlu Leu Phe Arg Asn Leu Met Lys Ser Ala 305 310 315 320 Phe Val Val GluArg Gln Pro Cys Met Pro Met His Pro Asp Arg Pro 325 330 335 Leu Val IleLys Thr Gly Val Gln Phe Thr Thr Lys Val Arg Leu Leu 340 345 350 Val LysPhe Pro Glu Leu Asn Tyr Gln Leu Lys Ile Lys Val Cys Ile 355 360 365 AspLys Asp Ser Gly Asp Val Ala Ala Leu Arg Gly Ser Arg Lys Phe 370 375 380Asn Ile Leu Gly Thr Asn Thr Lys Val Met Asn Met Glu Glu Ser Asn 385 390395 400 Asn Gly Ser Leu Ser Ala Glu Phe Lys His Leu Thr Leu Arg Glu Gln405 410 415 Arg Cys Gly Asn Gly Gly Arg Ala Asn Cys Asp Ala Ser Leu IleVal 420 425 430 Thr Glu Glu Leu His Leu Ile Thr Phe Glu Thr Glu Val TyrHis Gln 435 440 445 Gly Leu Lys Ile Asp Leu Glu Thr His Ser Leu Pro ValVal Val Ile 450 455 460 Ser Asn Ile Cys Gln Met Pro Asn Ala Trp Ala SerIle Leu Trp Tyr 465 470 475 480 Asn Met Leu Thr Asn Asn Pro Lys Asn ValAsn Phe Phe Thr Lys Pro 485 490 495 Pro Ile Gly Thr Trp Asp Gln Val AlaGlu Val Leu Ser Trp Gln Phe 500 505 510 Ser Ser Thr Thr Lys Arg Gly LeuSer Ile Glu Gln Leu Thr Thr Leu 515 520 525 Ala Glu Lys Leu Leu Gly ProGly Val Asn Tyr Ser Gly Cys Gln Ile 530 535 540 Thr Trp Ala Lys Phe CysLys Glu Asn Met Ala Gly Lys Gly Phe Ser 545 550 555 560 Phe Trp Val TrpLeu Asp Asn Ile Ile Asp Leu Val Lys Lys Tyr Ile 565 570 575 Leu Ala LeuTrp Asn Glu Gly Tyr Ile Met Gly Phe Ile Ser Lys Glu 580 585 590 Arg GluArg Ala Ile Leu Ser Thr Lys Pro Pro Gly Thr Phe Leu Leu 595 600 605 ArgPhe Ser Glu Ser Ser Lys Glu Gly Gly Val Thr Phe Thr Trp Val 610 615 620Glu Lys Asp Ile Ser Gly Lys Thr Gln Ile Gln Ser Val Glu Pro Tyr 625 630635 640 Thr Lys Gln Gln Leu Asn Asn Met Ser Phe Ala Glu Ile Ile Met Gly645 650 655 Tyr Lys Ile Met Asp Ala Thr Asn Ile Leu Val Ser Pro Leu ValTyr 660 665 670 Leu Tyr Pro Asp Ile Pro Lys Glu Glu Ala Phe Gly Lys TyrCys Arg 675 680 685 Pro Glu Ser Gln Glu His Pro Glu Ala Asp Pro Gly SerAla Ala Pro 690 695 700 Tyr Leu Lys Thr Lys Phe Ile Cys Val Thr Pro ThrThr Cys Ser Asn 705 710 715 720 Thr Ile Asp Leu Pro Met Ser Pro Arg ThrLeu Asp Ser Leu Met Gln 725 730 735 Phe Gly Asn Asn Gly Glu Gly Ala GluPro Ser Ala Gly Gly Gln Phe 740 745 750 Glu Ser Leu Thr Phe Asp Met AspLeu Thr Ser Glu Cys Ala Thr Ser 755 760 765 Pro Met 770 749 amino acidsamino acid single linear protein NO 9 Met Ser Gln Trp Asn Gln Val GlnGln Leu Glu Ile Lys Phe Leu Glu 1 5 10 15 Gln Val Asp Gln Phe Tyr AspAsp Asn Phe Pro Met Glu Ile Arg His 20 25 30 Leu Leu Ala Gln Trp Ile GluThr Gln Asp Trp Glu Val Ala Ser Asn 35 40 45 Asn Glu Thr Met Ala Thr IleLeu Leu Gln Asn Leu Leu Ile Gln Leu 50 55 60 Asp Glu Gln Leu Gly Arg ValSer Lys Glu Lys Asn Leu Leu Leu Ile 65 70 75 80 His Asn Leu Lys Arg IleArg Lys Val Leu Gln Gly Lys Phe His Gly 85 90 95 Asn Pro Met His Val AlaVal Val Ile Ser Asn Cys Leu Arg Glu Glu 100 105 110 Arg Arg Ile Leu AlaAla Ala Asn Met Pro Ile Gln Gly Pro Leu Glu 115 120 125 Lys Ser Leu GlnSer Ser Ser Val Ser Glu Arg Gln Arg Asn Val Glu 130 135 140 His Lys ValSer Ala Ile Lys Asn Ser Val Gln Met Thr Glu Gln Asp 145 150 155 160 ThrLys Tyr Leu Glu Asp Leu Gln Asp Glu Phe Asp Tyr Arg Tyr Lys 165 170 175Thr Ile Gln Thr Met Asp Gln Gly Asp Lys Asn Ser Ile Leu Val Asn 180 185190 Gln Glu Val Leu Thr Leu Leu Gln Glu Met Leu Asn Ser Leu Asp Phe 195200 205 Lys Arg Lys Glu Ala Leu Ser Lys Met Thr Gln Ile Val Asn Glu Thr210 215 220 Asp Leu Leu Met Asn Ser Met Leu Leu Glu Glu Leu Gln Asp TrpLys 225 230 235 240 Lys Arg Gln Gln Ile Ala Cys Ile Gly Gly Pro Leu HisAsn Gly Leu 245 250 255 Asp Gln Leu Gln Asn Cys Phe Thr Leu Leu Ala GluSer Leu Phe Gln 260 265 270 Leu Arg Gln Gln Leu Glu Lys Leu Gln Glu GlnSer Thr Lys Met Thr 275 280 285 Tyr Glu Gly Asp Pro Ile Pro Ala Gln ArgAla His Leu Leu Glu Arg 290 295 300 Ala Thr Phe Leu Ile Tyr Asn Leu PheLys Asn Ser Phe Val Val Glu 305 310 315 320 Arg Gln Pro Cys Met Pro ThrHis Pro Gln Arg Pro Met Val Leu Lys 325 330 335 Thr Leu Ile Gln Phe ThrVal Lys Leu Arg Leu Leu Ile Lys Leu Pro 340 345 350 Glu Leu Asn Tyr GlnVal Lys Val Lys Ala Ser Ile Asp Lys Asn Val 355 360 365 Ser Thr Leu SerAsn Arg Arg Phe Val Leu Cys Gly Thr His Val Lys 370 375 380 Ala Met SerSer Glu Glu Ser Ser Asn Gly Ser Leu Ser Val Glu Phe 385 390 395 400 ArgHis Leu Gln Pro Lys Glu Met Lys Cys Ser Thr Gly Ser Lys Gly 405 410 415Asn Glu Gly Cys His Met Val Thr Glu Glu Leu His Ser Ile Thr Phe 420 425430 Glu Thr Gln Ile Cys Leu Tyr Gly Leu Thr Ile Asn Leu Glu Thr Ser 435440 445 Ser Leu Pro Val Val Met Ile Ser Asn Val Ser Gln Leu Pro Asn Ala450 455 460 Trp Ala Ser Ile Ile Trp Tyr Asn Val Ser Thr Asn Asp Ser GlnAsn 465 470 475 480 Leu Val Phe Phe Asn Asn Pro Pro Ser Val Thr Leu GlyGln Leu Leu 485 490 495 Glu Val Met Ser Trp Gln Phe Ser Ser Tyr Val GlyArg Gly Leu Asn 500 505 510 Ser Glu Gln Leu Asn Met Leu Ala Glu Lys LeuThr Val Gln Ser Asn 515 520 525 Tyr Asn Asp Gly His Leu Thr Trp Ala LysPhe Cys Lys Glu His Leu 530 535 540 Pro Gly Lys Thr Phe Thr Phe Trp ThrTrp Leu Glu Ala Ile Leu Asp 545 550 555 560 Leu Ile Lys Lys His Ile LeuPro Leu Trp Ile Asp Gly Tyr Ile Met 565 570 575 Gly Phe Val Ser Lys GluLys Glu Arg Leu Leu Leu Lys Asp Lys Met 580 585 590 Pro Gly Thr Phe LeuLeu Arg Phe Ser Glu Ser His Leu Gly Gly Ile 595 600 605 Thr Phe Thr TrpVal Asp Gln Ser Glu Asn Gly Glu Val Arg Phe His 610 615 620 Ser Val GluPro Tyr Asn Lys Gly Arg Leu Ser Ala Leu Ala Phe Ala 625 630 635 640 AspIle Leu Arg Asp Tyr Lys Val Ile Met Ala Glu Asn Ile Pro Glu 645 650 655Asn Pro Leu Lys Tyr Leu Tyr Pro Asp Ile Pro Lys Asp Lys Ala Phe 660 665670 Gly Lys His Tyr Ser Ser Gln Pro Cys Glu Val Ser Arg Pro Thr Glu 675680 685 Arg Gly Asp Lys Gly Tyr Val Pro Ser Val Phe Ile Pro Ile Ser Thr690 695 700 Ile Arg Ser Asp Ser Thr Glu Pro Gln Ser Pro Ser Asp Leu LeuPro 705 710 715 720 Met Ser Pro Ser Ala Tyr Ala Val Leu Arg Glu Asn LeuSer Pro Thr 725 730 735 Thr Ile Glu Thr Ala Met Asn Ser Pro Tyr Ser AlaGlu 740 745 793 amino acids amino acid single linear protein NO 10 MetAla Gly Trp Ile Gln Ala Gln Gln Leu Gln Gly Asp Ala Leu Arg 1 5 10 15Gln Met Gln Val Leu Tyr Gly Gln His Phe Pro Ile Glu Val Arg His 20 25 30Tyr Leu Ala Gln Trp Ile Glu Ser Gln Pro Trp Asp Ala Ile Asp Leu 35 40 45Asp Asn Pro Gln Asp Arg Gly Gln Ala Thr Gln Leu Leu Glu Gly Leu 50 55 60Val Gln Glu Leu Gln Lys Lys Ala Glu His Gln Val Gly Glu Asp Gly 65 70 7580 Phe Leu Leu Lys Ile Lys Leu Gly His Tyr Ala Thr Gln Leu Gln Asn 85 9095 Thr Tyr Asp Arg Cys Pro Met Glu Leu Val Arg Cys Ile Arg His Ile 100105 110 Leu Tyr Asn Glu Gln Arg Leu Val Arg Glu Ala Asn Asn Cys Ser Ser115 120 125 Pro Ala Gly Val Leu Val Asp Ala Met Ser Gln Lys His Leu GlnIle 130 135 140 Asn Gln Arg Phe Glu Glu Leu Arg Leu Ile Thr Gln Asp ThrGlu Asn 145 150 155 160 Glu Leu Lys Lys Leu Gln Gln Thr Gln Glu Tyr PheIle Ile Gln Tyr 165 170 175 Gln Glu Ser Leu Arg Ile Gln Ala Gln Phe AlaGln Leu Gly Gln Leu 180 185 190 Asn Pro Gln Glu Arg Met Ser Arg Glu ThrAla Leu Gln Gln Lys Gln 195 200 205 Val Ser Leu Glu Thr Trp Leu Gln ArgGlu Ala Gln Thr Leu Gln Gln 210 215 220 Tyr Arg Val Glu Leu Ala Glu LysHis Gln Lys Thr Leu Gln Leu Leu 225 230 235 240 Arg Lys Gln Gln Thr IleIle Leu Asp Asp Glu Leu Ile Gln Trp Lys 245 250 255 Arg Arg Gln Gln LeuAla Gly Asn Gly Gly Pro Pro Glu Gly Ser Leu 260 265 270 Asp Val Leu GlnSer Trp Cys Glu Lys Leu Ala Glu Ile Ile Trp Gln 275 280 285 Asn Arg GlnGln Ile Arg Arg Ala Glu His Leu Cys Gln Gln Leu Pro 290 295 300 Ile ProGly Pro Val Glu Glu Met Leu Ala Glu Val Asn Ala Thr Ile 305 310 315 320Thr Asp Ile Ile Ser Ala Leu Val Thr Ser Thr Phe Ile Ile Glu Lys 325 330335 Gln Pro Pro Gln Val Leu Lys Thr Gln Thr Lys Phe Ala Ala Thr Val 340345 350 Arg Leu Leu Val Gly Gly Lys Leu Asn Val His Met Asn Pro Pro Gln355 360 365 Val Lys Ala Thr Ile Ile Ser Glu Gln Gln Ala Lys Ser Leu LeuLys 370 375 380 Asn Glu Asn Thr Arg Asn Glu Cys Ser Gly Glu Ile Leu AsnAsn Cys 385 390 395 400 Cys Val Met Glu Tyr His Gln Ala Thr Gly Thr LeuSer Ala His Phe 405 410 415 Arg Asn Met Ser Leu Lys Arg Ile Lys Arg AlaAsp Arg Arg Gly Ala 420 425 430 Glu Ser Val Thr Glu Glu Lys Phe Thr ValLeu Phe Glu Ser Gln Phe 435 440 445 Ser Val Gly Ser Asn Glu Leu Val PheGln Val Lys Thr Leu Ser Leu 450 455 460 Pro Val Val Val Ile Val His GlySer Gln Asp His Asn Ala Thr Ala 465 470 475 480 Thr Val Leu Trp Asp AsnAla Phe Ala Glu Pro Gly Arg Val Pro Phe 485 490 495 Ala Val Pro Asp LysVal Leu Trp Pro Gln Leu Cys Glu Ala Leu Asn 500 505 510 Met Lys Phe LysAla Glu Val Gln Ser Asn Arg Gly Leu Thr Lys Glu 515 520 525 Asn Leu ValPhe Leu Ala Gln Lys Leu Phe Asn Ile Ser Ser Asn His 530 535 540 Leu GluAsp Tyr Asn Ser Met Ser Val Ser Trp Ser Gln Phe Asn Arg 545 550 555 560Glu Asn Leu Pro Gly Trp Asn Tyr Thr Phe Trp Gln Trp Phe Asp Gly 565 570575 Val Met Glu Val Leu Lys Lys His His Lys Pro His Trp Asn Asp Gly 580585 590 Ala Ile Leu Gly Phe Val Asn Lys Gln Gln Ala His Asp Leu Leu Ile595 600 605 Asn Lys Pro Asp Gly Thr Phe Leu Leu Arg Phe Ser Asp Ser GluIle 610 615 620 Gly Gly Ile Thr Ile Ala Trp Lys Phe Asp Ser Pro Asp ArgAsn Leu 625 630 635 640 Trp Asn Leu Lys Pro Phe Thr Thr Arg Asp Phe SerIle Arg Ser Leu 645 650 655 Ala Asp Arg Leu Gly Asp Leu Asn Tyr Leu IleTyr Val Phe Pro Asp 660 665 670 Arg Pro Lys Asp Glu Val Phe Ala Lys TyrTyr Thr Pro Val Leu Ala 675 680 685 Lys Ala Val Asp Gly Tyr Val Lys ProGln Ile Lys Gln Val Val Pro 690 695 700 Glu Phe Val Asn Ala Ser Thr AspAla Gly Ala Ser Ala Thr Tyr Met 705 710 715 720 Asp Gln Ala Pro Ser ProVal Val Cys Pro Gln Pro His Tyr Asn Met 725 730 735 Tyr Pro Pro Asn ProAsp Pro Val Leu Asp Gln Asp Gly Glu Phe Asp 740 745 750 Leu Asp Glu SerMet Asp Val Ala Arg His Val Glu Glu Leu Leu Arg 755 760 765 Arg Pro MetAsp Ser Leu Asp Ala Arg Leu Ser Pro Pro Ala Gly Leu 770 775 780 Phe ThrSer Ala Arg Ser Ser Leu Ser 785 790 786 amino acids amino acid singlelinear protein NO 11 Met Ala Met Trp Ile Gln Ala Gln Gln Leu Gln Gly AspAla Leu His 1 5 10 15 Gln Met Gln Ala Leu Tyr Gly Gln His Phe Pro IleGlu Val Arg His 20 25 30 Tyr Leu Ser Gln Trp Ile Glu Ser Gln Ala Trp AspSer Ile Asp Leu 35 40 45 Asp Asn Pro Gln Glu Asn Ile Lys Ala Thr Gln LeuLeu Glu Gly Leu 50 55 60 Val Gln Glu Leu Gln Lys Lys Ala Glu His Gln ValGly Glu Asp Gly 65 70 75 80 Phe Leu Leu Lys Ile Lys Leu Gly His Tyr AlaThr Gln Leu Gln Ser 85 90 95 Thr Tyr Asp Arg Cys Pro Met Glu Leu Val ArgCys Ile Arg His Ile 100 105 110 Leu Tyr Asn Glu Gln Arg Leu Val Arg GluAla Asn Asn Gly Ser Ser 115 120 125 Pro Ala Gly Ser Leu Ala Asp Ala MetSer Gln Lys His Leu Gln Ile 130 135 140 Asn Gln Thr Phe Glu Glu Leu ArgLeu Ile Thr Gln Asp Thr Glu Asn 145 150 155 160 Glu Leu Lys Lys Leu GlnGln Thr Gln Glu Tyr Phe Ile Ile Gln Tyr 165 170 175 Gln Glu Ser Leu ArgIle Gln Ala Gln Phe Ala Gln Leu Gly Gln Leu 180 185 190 Asn Pro Gln GluArg Met Ser Arg Glu Thr Ala Leu Gln Gln Lys Gln 195 200 205 Val Ser LeuGlu Thr Trp Leu Gln Arg Glu Ala Gln Thr Leu Gln Gln 210 215 220 Tyr ArgVal Glu Leu Ala Glu Lys His Gln Lys Thr Leu Gln Leu Leu 225 230 235 240Arg Lys Gln Gln Thr Ile Ile Leu Asp Asp Glu Leu Ile Gln Trp Lys 245 250255 Arg Arg Gln Gln Leu Ala Gly Asn Gly Gly Pro Pro Glu Gly Ser Leu 260265 270 Asp Val Leu Gln Ser Trp Cys Glu Lys Leu Ala Glu Ile Ile Trp Gln275 280 285 Asn Arg Gln Gln Ile Arg Arg Ala Glu His Leu Cys Gln Gln LeuPro 290 295 300 Ile Pro Gly Pro Val Glu Glu Met Leu Ala Glu Val Asn AlaThr Ile 305 310 315 320 Thr Asp Ile Ile Ser Ala Leu Val Thr Ser Thr PheIle Ile Glu Lys 325 330 335 Gln Pro Pro Gln Val Leu Lys Thr Gln Thr LysPhe Ala Ala Thr Val 340 345 350 Arg Leu Leu Val Gly Gly Lys Leu Asn ValHis Met Asn Pro Pro Gln 355 360 365 Val Lys Ala Thr Ile Ile Ser Glu GlnGln Ala Lys Ser Leu Leu Lys 370 375 380 Asn Glu Asn Thr Arg Asn Asp TyrSer Gly Glu Ile Leu Asn Asn Cys 385 390 395 400 Cys Val Met Glu Tyr HisGln Ala Thr Gly Thr Leu Ser Ala His Phe 405 410 415 Arg Asn Met Ser LeuLys Arg Ile Lys Arg Ser Asp Arg Arg Gly Ala 420 425 430 Gly Ser Val ThrGlu Glu Lys Phe Thr Ile Leu Phe Asp Ser Gln Phe 435 440 445 Ser Val GlyGly Asn Glu Leu Val Phe Gln Val Lys Thr Leu Ser Leu 450 455 460 Pro ValVal Val Ile Val His Gly Ser Gln Asp Asn Asn Ala Thr Ala 465 470 475 480Thr Val Leu Trp Asp Asn Ala Phe Ala Glu Pro Gly Arg Val Pro Phe 485 490495 Ala Val Pro Asp Lys Val Leu Trp Pro Gln Leu Cys Glu Ala Leu Asn 500505 510 Met Lys Phe Lys Ala Glu Val Gln Ser Asn Arg Gly Leu Thr Lys Glu515 520 525 Asn Leu Val Phe Leu Ala Gln Lys Leu Phe Asn Ile Ser Ser AsnHis 530 535 540 Leu Glu Asp Tyr Asn Ser Met Ser Val Ser Trp Ser Gln PheAsn Arg 545 550 555 560 Glu Asn Leu Pro Gly Arg Asn Tyr Thr Phe Trp GlnTrp Phe Asp Gly 565 570 575 Val Met Glu Val Leu Lys Lys His Leu Lys ProHis Trp Asn Asp Gly 580 585 590 Ala Ile Leu Gly Phe Val Asn Lys Gln GlnAla His Asp Leu Leu Ile 595 600 605 Asn Lys Pro Asp Gly Thr Phe Leu LeuArg Phe Ser Asp Ser Glu Ile 610 615 620 Gly Gly Ile Thr Ile Ala Trp LysPhe Asp Ser Gln Glu Arg Met Phe 625 630 635 640 Trp Asn Leu Met Pro PheThr Thr Arg Asp Phe Ser Ile Arg Ser Leu 645 650 655 Ala Asp Arg Leu GlyAsp Leu Asn Tyr Leu Ile Tyr Val Phe Pro Asp 660 665 670 Arg Pro Lys AspGlu Val Tyr Ser Lys Tyr Tyr Thr Pro Val Pro Cys 675 680 685 Glu Pro AlaThr Ala Lys Ala Ala Asp Gly Tyr Val Lys Pro Gln Ile 690 695 700 Lys GlnVal Val Pro Glu Phe Ala Asn Ala Ser Thr Asp Ala Gly Ser 705 710 715 720Gly Ala Thr Tyr Met Asp Gln Ala Pro Ser Pro Val Val Cys Pro Gln 725 730735 Ala His Tyr Asn Met Tyr Pro Pro Asn Pro Asp Ser Val Leu Asp Thr 740745 750 Asp Gly Asp Phe Asp Leu Glu Asp Thr Met Asp Val Ala Arg Arg Val755 760 765 Glu Glu Leu Leu Gly Arg Pro Met Asp Ser Gln Trp Ile Pro HisAla 770 775 780 Gln Ser 785 837 amino acids amino acid single linearprotein NO 12 Met Ser Leu Trp Gly Leu Ile Ser Lys Met Ser Pro Glu LysLeu Gln 1 5 10 15 Arg Leu Tyr Val Asp Phe Pro Gln Arg Leu Arg His LeuLeu Ala Asp 20 25 30 Trp Leu Glu Ser Gln Pro Trp Glu Phe Leu Val Gly SerAsp Ala Phe 35 40 45 Cys Tyr Asn Met Ala Ser Ala Leu Leu Ser Ala Thr ValGln Arg Leu 50 55 60 Gln Ala Thr Ala Gly Glu Gln Gly Lys Gly Asn Ser IleLeu Pro His 65 70 75 80 Ile Ser Thr Leu Glu Ser Ile Tyr Gln Arg Asp ProLeu Lys Leu Val 85 90 95 Ala Thr Ile Arg Gln Ile Leu Gln Gly Glu Lys LysAla Val Ile Glu 100 105 110 Glu Phe Arg His Leu Pro Gly Pro Phe His ArgLys Gln Glu Glu Leu 115 120 125 Lys Phe Thr Thr Pro Leu Gly Arg Leu HisHis Arg Val Arg Glu Thr 130 135 140 Arg Leu Leu Arg Glu Ser Leu His LeuGly Pro Lys Thr Gly Gln Val 145 150 155 160 Ser Leu Gln Asn Leu Ile AspPro Pro Leu Asn Gly Pro Gly Pro Ser 165 170 175 Glu Asp Leu Pro Thr IleLeu Gln Gly Thr Val Gly Asp Leu Glu Thr 180 185 190 Thr Gln Pro Leu ValLeu Leu Arg Ile Gln Ile Trp Lys Arg Gln Gln 195 200 205 Gln Leu Ala GlyAsn Gly Thr Pro Phe Glu Glu Ser Leu Ala Gly Leu 210 215 220 Gln Glu ArgCys Glu Ser Leu Val Glu Ile Tyr Ser Gln Leu His Gln 225 230 235 240 GluIle Gly Ala Ala Ser Gly Glu Leu Glu Pro Lys Thr Arg Ala Ser 245 250 255Leu Ile Ser Arg Leu Asp Glu Val Leu Arg Thr Leu Val Thr Ser Ser 260 265270 Phe Leu Val Glu Lys Gln Pro Pro Gln Val Leu Lys Thr Gln Thr Lys 275280 285 Phe Gln Ala Gly Val Arg Phe Leu Leu Gly Leu Gln Phe Leu Gly Thr290 295 300 Ser Thr Lys Pro Pro Met Val Arg Ala Asp Met Val Thr Glu LysGln 305 310 315 320 Ala Arg Glu Leu Ser Leu Ser Gln Gly Pro Gly Thr GlyVal Glu Ser 325 330 335 Thr Gly Glu Ile Met Asn Asn Thr Val Pro Leu GluAsn Ser Ile Pro 340 345 350 Ser Asn Cys Cys Ser Ala Leu Phe Lys Asn LeuLeu Leu Lys Lys Ile 355 360 365 Lys Arg Cys Glu Arg Lys Gly Thr Glu SerVal Thr Glu Glu Lys Cys 370 375 380 Ala Val Leu Phe Ser Thr Ser Phe ThrLeu Gly Pro Asn Lys Leu Leu 385 390 395 400 Ile Gln Leu Gln Ala Leu SerLeu Ser Leu Val Val Ile Val His Gly 405 410 415 Asn Gln Asp Asn Asn AlaLys Ala Thr Ile Leu Trp Asp Asn Ala Phe 420 425 430 Ser Glu Met Asp ArgVal Pro Phe Val Val Gly Glu Arg Val Pro Trp 435 440 445 Glu Lys Met CysGlu Thr Leu Asn Leu Lys Phe Met Val Glu Val Gly 450 455 460 Thr Ser ArgGly Leu Leu Pro Glu His Phe Leu Phe Leu Ala Gln Lys 465 470 475 480 IlePhe Asn Asp Asn Ser Leu Ser Val Glu Ala Phe Gln His Arg Cys 485 490 495Val Ser Trp Ser Gln Phe Asn Lys Glu Ile Leu Leu Gly Arg Gly Phe 500 505510 Thr Phe Trp Gln Trp Phe Asp Gly Val Leu Asp Leu Thr Lys Arg Cys 515520 525 Leu Arg Ser Tyr Trp Ser Asp Arg Leu Ile Ile Gly Phe Ile Ser Lys530 535 540 Gln Tyr Val Thr Ser Leu Leu Leu Asn Glu Pro Asp Gly Thr PheLeu 545 550 555 560 Leu Arg Phe Ser Asp Ser Glu Ile Gly Gly Ile Thr IleAla His Val 565 570 575 Ile Arg Gly Gln Asp Gly Ser Ser Gln Ile Glu AsnIle Gln Pro Phe 580 585 590 Ser Ala Lys Asp Leu Ser Ile Arg Ser Leu GlyAsp Arg Ile Arg Asp 595 600 605 Leu Ala Gln Leu Lys Asn Leu Tyr Pro LysLys Pro Lys Asp Glu Ala 610 615 620 Phe Arg Ser His Tyr Lys Pro Glu GlnMet Gly Lys Asp Gly Arg Gly 625 630 635 640 Tyr Val Ser Thr Thr Ile LysMet Thr Val Glu Arg Asp Gln Pro Leu 645 650 655 Pro Thr Pro Glu Pro GlnMet Pro Ala Met Val Pro Pro Tyr Asp Leu 660 665 670 Gly Met Ala Pro AspAla Ser Met Gln Leu Ser Ser Asp Met Gly Tyr 675 680 685 Pro Pro Gln SerIle His Ser Phe Gln Ser Leu Glu Glu Ser Met Ser 690 695 700 Val Leu ProSer Phe Gln Glu Pro His Leu Gln Met Pro Pro Asn Met 705 710 715 720 SerGln Ile Thr Met Pro Phe Asp Gln Pro His Pro Gln Gly Leu Leu 725 730 735Gln Cys Gln Ser Gln Glu His Ala Val Ser Ser Pro Glu Pro Met Leu 740 745750 Trp Ser Asp Val Thr Met Val Glu Asp Ser Cys Leu Thr Gln Pro Val 755760 765 Gly Gly Phe Pro Gln Gly Thr Trp Val Ser Glu Asp Met Tyr Pro Pro770 775 780 Leu Leu Pro Pro Thr Glu Gln Asp Leu Thr Lys Leu Leu Leu GluAsn 785 790 795 800 Gln Gly Glu Gly Gly Gly Ser Leu Gly Ser Gln Pro LeuLeu Lys Pro 805 810 815 Ser Pro Tyr Gly Gln Ser Gly Ile Ser Leu Ser HisLeu Asp Leu Arg 820 825 830 Thr Asn Pro Ser Trp 835 761 amino acidsamino acid single linear protein NO 13 Met Ser Leu Trp Lys Arg Ile SerSer His Val Asp Cys Glu Gln Arg 1 5 10 15 Met Ala Ala Tyr Tyr Glu GluLys Gly Met Leu Glu Leu Arg Leu Cys 20 25 30 Leu Ala Pro Trp Ile Glu AspArg Ile Met Ser Glu Gln Ile Thr Pro 35 40 45 Asn Thr Thr Asp Gln Leu GluArg Val Ala Leu Lys Phe Asn Glu Asp 50 55 60 Leu Gln Gln Lys Leu Leu SerThr Arg Thr Ala Ser Asp Gln Ala Leu 65 70 75 80 Lys Phe Arg Val Val GluLeu Cys Ala Leu Ile Gln Arg Ile Ser Ala 85 90 95 Val Glu Leu Tyr Thr HisLeu Arg Ser Gly Leu Gln Lys Glu Leu Gln 100 105 110 Leu Val Thr Glu LysSer Val Ala Ala Thr Ala Gly Gln Ser Met Pro 115 120 125 Leu Asn Pro TyrAsn Met Asn Asn Thr Pro Met Val Thr Gly Tyr Met 130 135 140 Val Asp ProSer Asp Leu Leu Ala Val Ser Asn Ser Cys Asn Pro Pro 145 150 155 160 ValVal Gln Gly Ile Gly Pro Ile His Asn Val Gln Asn Thr Gly Ile 165 170 175Ala Ser Pro Ala Leu Gly Met Val Thr Pro Lys Val Glu Leu Tyr Glu 180 185190 Val Gln His Gln Ile Met Gln Ser Leu Asn Glu Phe Gly Asn Cys Ala 195200 205 Asn Ala Leu Lys Leu Leu Ala Gln Asn Tyr Ser Tyr Met Leu Asn Ser210 215 220 Thr Ser Ser Pro Asn Ala Glu Ala Ala Tyr Arg Ser Leu Ile AspGlu 225 230 235 240 Lys Ala Ala Ile Val Leu Thr Met Arg Arg Ser Phe MetTyr Tyr Glu 245 250 255 Ser Leu His Glu Met Val Ile His Glu Leu Lys AsnTrp Arg His Gln 260 265 270 Gln Ala Gln Ala Gly Asn Gly Ala Pro Phe AsnGlu Gly Ser Leu Asp 275 280 285 Asp Ile Gln Arg Cys Phe Glu Met Leu GluSer Phe Ile Ala His Met 290 295 300 Leu Ala Ala Val Lys Glu Leu Met ArgVal Arg Leu Val Thr Glu Glu 305 310 315 320 Pro Glu Leu Thr His Leu LeuGlu Gln Val Gln Asn Ala Gln Lys Asn 325 330 335 Leu Val Cys Ser Ala PheIle Val Asp Lys Gln Pro Pro Gln Val Met 340 345 350 Lys Thr Asn Thr ArgPhe Ala Ala Ser Val Arg Trp Leu Ile Gly Ser 355 360 365 Gln Leu Gly IleHis Asn Asn Pro Pro Thr Val Glu Cys Ile Ile Met 370 375 380 Ser Glu IleGln Ser Gln Arg Phe Val Thr Arg Asn Thr Gln Met Asp 385 390 395 400 AsnSer Ser Leu Ser Gly Gln Ser Ser Gly Glu Ile Gln Asn Ala Ser 405 410 415Ser Thr Met Glu Tyr Gln Gln Asn Asn His Val Phe Ser Ala Ser Phe 420 425430 Arg Asn Met Gln Leu Lys Lys Ile Lys Arg Ala Glu Lys Lys Gly Thr 435440 445 Glu Ser Val Met Asp Glu Lys Phe Ala Leu Phe Phe Tyr Thr Thr Thr450 455 460 Thr Val Asn Asp Phe Gln Ile Arg Val Trp Thr Leu Ser Leu ProVal 465 470 475 480 Val Val Ile Val His Gly Asn Gln Glu Pro Gln Ser TrpAla Thr Ile 485 490 495 Thr Trp Asp Asn Ala Phe Ala Glu Ile Val Arg AspPro Phe Met Ile 500 505 510 Thr Asp Arg Val Thr Trp Ala Gln Leu Ser ValAla Leu Asn Ile Lys 515 520 525 Phe Gly Ser Cys Thr Gly Arg Ser Leu ThrIle Asp Asn Leu Asp Phe 530 535 540 Leu Tyr Glu Lys Leu Gln Arg Glu GluArg Ser Glu Tyr Ile Thr Trp 545 550 555 560 Asn Gln Phe Cys Lys Glu ProMet Pro Asp Arg Ser Phe Thr Phe Trp 565 570 575 Glu Trp Phe Phe Ala IleMet Lys Leu Thr Lys Asp His Met Leu Gly 580 585 590 Met Trp Lys Ala GlyCys Ile Met Gly Phe Ile Asn Lys Thr Lys Ala 595 600 605 Gln Thr Asp LeuLeu Arg Ser Val Tyr Gly Ile Gly Thr Phe Leu Leu 610 615 620 Arg Phe SerAsp Ser Glu Leu Gly Gly Val Thr Ile Ala Tyr Val Asn 625 630 635 640 GluAsn Gly Leu Val Thr Met Leu Ala Pro Trp Thr Ala Arg Asp Phe 645 650 655Gln Val Leu Asn Leu Ala Asp Arg Ile Arg Asp Leu Asp Val Leu Cys 660 665670 Trp Leu His Pro Ser Asp Arg Asn Ala Ser Pro Val Lys Arg Asp Val 675680 685 Ala Phe Gly Glu Phe Tyr Ser Lys Arg Gln Glu Pro Glu Pro Leu Val690 695 700 Leu Asp Pro Val Thr Gly Tyr Val Lys Ser Thr Leu His Val HisVal 705 710 715 720 Cys Arg Asn Gly Glu Asn Gly Ser Thr Ser Gly Thr ProHis His Ala 725 730 735 Gln Glu Ser Met Gln Leu Gly Asn Gly Asp Phe GlyMet Ala Asp Phe 740 745 750 Asp Thr Ile Thr Asn Phe Glu Asn Phe 755 760

What is claimed is:
 1. A crystal of an N-terminal domain of a STATprotein, wherein the crystal effectively diffracts X-rays for thedetermination of the atomic coordinates of the N-terminal domain of theSTAT protein to a resolution of greater than 5.0 Angstroms.
 2. Thecrystal of claim 1 wherein the crystal effectively diffracts X-rays forthe determination of the atomic coordinates of the N-terminus to aresolution of greater than 3.0 Angstroms.
 3. The crystal of claim 1wherein the N-terminal domain comprises the amino acid sequence: Arg Xaa^(H)Xaa Leu Xaa Xaa Trp ^(H)Xaa Glu Xaa Gln Xaa Trp (SEQ ID NO: 1)wherein: ^(H)Xaa is either Ile, Leu, Val, Phe, or Tyr.
 4. The crystal ofclaim 3 wherein the N-terminal domain of the STAT protein is containedin a peptide fragment that consists of 100 to 150 amino acids.
 5. Thecrystal of claim 4 wherein the N-terminal domain of the STAT proteincomprises amino acids 4-112 of SEQ ID NO:2.
 6. The crystal of claim 5wherein the crystal effectively diffracts X-rays for the determinationof the atomic coordinates of the N-terminus to a resolution of 1.45Angstroms.
 7. The crystal of claim 5 having a space group of P6₅22 and aunit cell of dimensions a=79.5 1 Å, b=79.51 Å, and c=84.68 Å.
 8. Thecrystal of claim 1 having secondary structural elements comprising eighthelices (α1-α8) that are assembled into a hook-like structure having aninner and outer surface, wherein: (a) the first four helices (α1-α4)form a ring-shaped element having a proximal and a distal surface; (b)helices six (α6) and seven (α7) form an anti-parallel coiled-coil havinga proximal and a distal surface; (c) helix five (α5) connects thering-shaped element to the anti-parallel coiled-coil; and (d) helixeight (α8) is wrapped around the distal surface of the ring-shapedelement; wherein the inner surface of the hook-like structure is formedby the intersection of the proximal surface of the ring-shaped elementwith the proximal surface of the antiparallel coiled-coil.
 9. A methodof using the crystal of claim 1 in a drug screening assay comprising:(a) selecting a potential drug by performing rational drug design withthe three-dimensional structure determined for the crystal, wherein saidselecting is performed in conjunction with computer modeling; (b)contacting the potential drug with a dimeric STAT protein N-terminaldomain; and (c) detecting the binding of the potential drug with theN-terminal domain; wherein a drug is selected that binds to theN-terminal domain.
 10. The method of claim 9 wherein contacting thepotential drug with a dimeric STAT protein N-terminal domain isperformed with a STAT protein fragment containing a STAT proteinN-terminal domain comprising SEQ ID NO:
 1. 11. The method of claim 10wherein the STAT protein fragment is labeled.
 12. The method of claim 10wherein the STAT protein fragment is bound to a solid support.
 13. Amethod of using the crystal of claim 1 in a drug screening assaycomprising: (a) selecting a potential drug by performing rational drugdesign with the three-dimensional structure determined for the crystal,wherein said selecting is performed in conjunction with computermodeling; (b) contacting the potential drug with two or more dimericSTAT proteins in the presence of a nucleic acid containing at least twoadjacent weak binding sites for STAT protein dimers; and (c) detectingthe effect of the potential drug on the binding of the dimeric STATproteins to each other and/or to the nucleic acid; wherein a potentialdrug is selected as a candidate drug when it either enhances ordiminishes the binding of the dimeric STAT proteins to each other and/orthe nucleic acid.
 14. The method of claim 13 further comprising: (d)growing a supplemental crystal containing a protein-drug complex formedbetween the dimeric N-terminal domain and the candidate drug, whereinthe crystal effectively diffracts X-rays for the determination of theatomic coordinates of the protein-ligand complex to a resolution ofgreater than 5.0 Angstroms; (e) determining the three-dimensionalstructure of the supplemental crystal with molecular replacementanalysis; and (f) selecting a drug by performing rational drug designwith the three-dimensional structure determined for the supplementalcrystal, wherein said selecting is performed in conjunction withcomputer modeling.
 15. A method for identifying a drug that enhances ordiminishes the ability of STAT protein dimers to induce the expressionof a gene operably under the control of a promoter containing at leasttwo adjacent weak binding sites for STAT protein dimers comprising: (a)selecting a potential drug by performing rational drug design with thethree-dimensional structure determined for the crystal of claim 1,wherein said selecting is performed in conjunction with computermodeling; (b) measuring the level of expression of a first reporter geneand a second reporter gene contained by a host cell in the presence andabsence of the potential drug; wherein the first reporter gene isoperably linked to a first promoter containing at least two adjacentweak binding sites for STAT protein dimers, and the second reporter geneis operably linked to a second promoter comprising at least one strongbinding site for a STAT protein dimer; wherein the binding of STATprotein dimers to the two adjacent weak binding sites induces theexpression of the first reporter gene, and wherein the binding of theSTAT protein dimer to the strong binding site induces the expression ofthe second reporter gene; and wherein the host cell contains STATprotein dimers; and (c) comparing the level of expression of the firstreporter gene with that of the second reporter gene in the presence andabsence of the potential drug, wherein when the presence of thepotential drug results in an increase in the level of expression of thefirst reporter gene but not that of the second reporter gene, thepotential drug is identified as a drug that enhances the ability of STATprotein dimers to induce the expression of a gene operably under thecontrol of a promoter containing at least two adjacent weak bindingsites for STAT protein dimers; and when the presence of a potential drugresults in a decrease in the level of expression of the first reportergene but not that of the second reporter gene the potential drug isidentified as a drug that inhibits the ability of STAT protein dimers toinduce the expression of a gene operably under the control of a promotercontaining at least two adjacent weak binding sites for STAT proteindimers.
 16. The method of claim 15 wherein the host cell is a mammaliancell.
 17. The method of claim 15 wherein the first reporter gene iscontained by a first host cell, and the second reporter gene iscontained by a second host cell; and wherein the first host cell andsecond host cell both contain STAT protein dimers.
 18. The method ofclaim 15 wherein the weak STAT binding sites are selected from the groupconsisting of sites present in the regulatory regions of the MIG gene,the c-fos gene and the interferon-γ gene.
 19. The method of claim 15further comprising: (d) growing a supplemental crystal containing aprotein-drug complex formed between the dimeric N-terminal domain andthe drug, wherein the crystal effectively diffracts X-rays for thedetermination of the atomic coordinates of the protein-ligand complex toa resolution of greater than 5.0 Angstroms; (e) determining thethree-dimensional structure of the supplemental crystal with molecularreplacement analysis; and (f) selecting a drug by performing rationaldrug design with the three-dimensional structure determined for thesupplemental crystal, wherein said selecting is performed in conjunctionwith computer modeling.
 20. A method for identifying a drug thatmodulates the ability of adjacent STAT protein dimers to interact andbind to adjacent DNA binding sites comprising: (a) selecting a potentialdrug by performing rational drug design with the three-dimensionalstructure determined for the crystal of claim 1, wherein said selectingis performed in conjunction with computer modeling; (b) measuring thebinding affinity of the STAT protein, or a fragment thereof thatcomprises the N-terminal domain, to a nucleic acid comprising 2 adjacentweak STAT DNA binding sites in the presence and absence of the potentialdrug; (c) measuring the binding affinity of the STAT protein, or thefragment, to a nucleic acid comprising a single strong STAT binding sitein the presence and absence of the potential drug; and (d) comparing thebinding affinity measured in step (b) in the presence and absence of thepotential drug with the binding affinity measured in step (c) in thepresence and absence of the potential drug, wherein a potential drugwhich causes an increase in the binding affinity measured in step (b)but not in the binding affinity measured in step (c) is identified as adrug that enhances the interaction between adjacent activated STATdimers, and a potential drug which causes a decrease in the bindingaffinity measured in step (b) but not in the binding affinity measuredin step (c) is identified as a drug that inhibits the interactionbetween adjacent activated STAT dimers.
 21. A method for identifying adrug that modulates the ability of adjacent STAT protein dimers tointeract and bind to adjacent DNA binding sites comprising: (a)measuring the binding affinity of the STAT protein comprising theN-terminal domain, to a nucleic acid comprising two adjacent weak STATDNA binding sites in the presence and absence of a potential drug; (b)measuring the binding affinity of a modified form of the STAT proteinlacking the α4-tryptophan of N-terminal domain with the nucleic acid inthe presence and absence of the potential drug; and (c) comparing thebinding affinity measured in step (a) in the presence and absence of thepotential drug with the binding affinity measured in step (b) in thepresence and absence of the potential drug, wherein a potential drugwhich causes an increase in the binding affinity measured in step (a)but not in the binding affinity measured in step (b) is identified as adrug that enhances the interaction between adjacent activated STATdimers, and a potential drug which causes a decrease in the bindingaffinity measured in step (a) but not in the binding affinity measuredin step (b) is identified as a drug that inhibits the interactionbetween adjacent activated STAT dimers.
 22. A method of making thecrystal of claim 1 comprising placing an aliquot of a solutioncontaining the STAT N-terminal domain fragment on a cover slip as ahanging drop above a well containing a reservoir buffer that comprises0.2 M Na⁺CH₃COO⁻, 0.1 M Tris/HCL pH 8.0, 17% PEG4000.
 23. The method ofclaim 22 wherein said solution is prepared by combining a preparation ofthe STAT N-terminal domain fragment that contains 20 mg/ml of the STATN-terminal domain fragment in 50 mM Hepes/HCl pH 8.0, 150 mM KCl, 2.5 mMCaCl₂, and 5 mM DTT with an equal volume of the reservoir buffer; andwherein the aliquot comprises approximately 1 μl of said solution. 24.The method of claim 23 wherein the STAT N-terminal domain fragment isthe STAT-4 N-terminal domain fragment.